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LUNG CANCER MARKER 



TECHNICAL FIELD 

The invention relates to genes and proteins specific 
for certain cancers and methods for their detection. 
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BACKGROUND OF THE INVENTION 

Lung cancer is the most common form of cancer in the 
world. Estimates for the year 1985 indicate that there 
were about 900,000 cases of lung cancer worldwide. 
(Parkin, et al . , "Estimates of the worldwide incidence of 
eighteen major cancers in 1985," Jnt J Cancer 1993; 
54:594-606). For the United States alone, 1993 
projections placed the number of new lung cancer cases at 
170,000, with a mortality of about 88%. (Boring, et al., 
"Cancer statistics," CA Cancer J Clin 1993/ 43:7-26). 
Although the occurrence of breast cancer is slightly more 
common in the United States, lung cancer is second behind 
prostate cancer for males and third behind breast and 
colorectal cancers for women. Yet, lung cancer is the 
15 most common cause of cancer deaths. 

The World Health Organization classifies lung cancer 
into four major histological types: (1) squamous cell 
carcinoma (SCC) , (2) adenocarcinoma, (3) large cell 
carcinoma, and (4) small cell lung carcinoma (SCLC) . (The 
World Health Organization, "The World Health Organization 
histological typing of lung tumours," Am J Clin Pathol 
1982; 77:123-136). However, there is : a great deal of 
tumor heterogeneity even within the various subtypes, and 
it is not uncommon for lung cancer to have features of 
more than one morphologic subtype. The term non-small 
cell lung carcinoma (NSCLC) includes squamous, 
adenocarcinoma and large cell carcinomas. 

Typically, a combination of X-ray and sputum cytology 
is used to diagnose lung cancer. Unfortunately, by the 
time a patient seeks medical help for their symptoms, the 
cancer is at such an advanced state it is usually 
incurable. Cancer Facts and Figures (based on rates from 
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NCI SEER Program 1977-1981) , New York: American Cancer 
Society, 1986) . Routine large-scale radiologic or 
cytologic screening of smokers has been investigated. 
Studies concluded that cytomorphological screening did not 
significantly reduce the mortality rate from lung cancer 
and was not recommended for routine use. {"Early lung 
cancer detection: summary & conclusions," Am Rev Respir 
Dls 1984; 130:565-70). However, in a subpopulation of 
patients where the cancer is diagnosed at a very early 
stage and the lung is surgically resectioried, there is a 
5-year survival rate of 70-90%. (Flehinger, et al., "The 
effect of surgical treatment on survival from early lung 
cancer, " Chest; 1992, 101:1013-1018; Melamed, et" aL; 
"Screening for early lung cancer: results of the Memorial 
Sloan-Kettering Study in New York," Chest; 19.84 86:44-53). 
Therefore, research has focused on early detection of 
tumor markers before the cancer becomes clinically 
apparent and while the cancer is still localized and 
amenable to therapy. 

The identification of antigens associated with lung 
cancer has stimulated considerable interest because of 
their use in screening, diagnosis, clinical management, 
and potential treatment of lung cancer. International 
workshops have attempted to classify the lung cancer 
antigens into 15 possible clusters that may define 
histologic origins. (Soufiami, et al., "Antigens of lung 
cancer: results of the second international workshop on 
lung cancer antigens," JNCI 1991; 83:609-612). As of 
1988, more than 200 monoclonal antibodies (MAb) have been 
reported to react with human lung tumors. (Radosevich, et 
al., "Monoclonal antibody assays for lung cancer," In: 
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Cancer Diagnosis in Vitro Using Monoclonal Antibodies. 
Edited by H. A. Kupchik. New York: Marcel Dekker, 1988). 

MAbs for lung cancer were first developed to 
distinguish NSCLC from SCLC. (Mulshine, et al . , 
"Monoclonal antibodies that distinguish nonsmall-cell from 
small-cell lung cancer," J Immunol 1983; 121:497-502). In 
most cases, the identity of the cell surface antigen with 
which a particular antibody reacts is not known, or has 
not been well characterized. (Scott, et al., "Early lung 
cancer detection using monoclonal antibodies," In: Lung 
Cancer. Edited by J. A. Roth, J.D. Cox, and W.K. Hong. 
Boston: Blackwell Scientific Publications, 1993) . 

MAbs have been used in the immunocytochemical 
staining of sputum samples to predict the progression of 
lung cancer. (Tockman, et al., "Sensitive and specific 
monoclonal antibody recognition of human lung cancer 
antigen on preserved sputum cells: a new approach to early 
lung cancer detection," J Clin Oncol 1988; 6:1685-1693). 
In the study, two MAbs were utilized, 624H12 which binds a 
glycolipid antigen expressed in SCLC and 703D4 which is 
directed to a protein antigen of NSCLC. Of the sputum 
specimens from participants who progressed to lung cancer, 
two-thirds showed positive reactivity with either the SCLC 
or the NSCLC MAb. In contrast, of those that did not 
progress to lung cancer, 35 of 40 did not react with the 
SCLC or NSCLC Mab. This study suggests the need for the 
development of additional early detection targets to 
discover the onset of malignancy at the earliest possible 
stage. 

Carcinoembryonic antigen (CEA) is a frequently 
studied tumor marker of cancer including lung cancer. 
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(Nutini, et al., "Serum NSE, CEA, CT, CA 15-3 levels in 
human lung cancer," Jnt J Biol Markers 1990; 5:198-202). 
Squamous cell carcinoma antigen is another established 
serum marker. (Margolis, et al . , "Serum tumor markers in 
non-small cell lung cancer," Cancer 1994; 73:605-609.). 
Other serum antigens for lung cancer include antigens 
recognized by MAbs 5E8, 5C7, and 1F10, the combination of 
which distinguishes between patients with lung cancer from 
those without. (Schepart, et al . , "Monoclonal antibody- 
mediated detection of lung cancer antigens in serum, - Am 
Rev Respir Dis 1988; 138:1434-8) Furthermore, the 
combination of 5E8, 5C7 and 1F10 was more • sensitive, 
specific and accurate for identifying NSCLC when compared 
to results from a combination of the CEA and squamous cell 
carcinoma antigen tests. (Margolis, et al . , Cancer 1994; 
73: 605-609) . 

Serum CA 125, initially described as an ovarian 
cancer-associated antigen, has been investigated for its 
use as a prognostic factor in NSCLC. (Diez, et al., 
"Prognostic significance of serum CA 125 antigen assay in 
patients with non-small cell lung cancer, r Cancer 1994; 
73:136876). The study determined that the preoperative 
serum level of CA 125 antigen is inversely correlated with 
survival and tumor relapse in NSCLC. 

Despite the numerous examples of MAb applications, 
none has yet emerged that has changed clinical practice 
(Mulshine, et al . , "Applications of monoclonal antibodies 
in the treatment of solid tumors," In: Biologic Therapy of 
Cancer. Edited by V.T. Devita, s. Hellman, and S.A. 
Rosenberg. Philadelphia: JB Lippincott, 1991, pp. 563- 
588) . MAbs alone may not be the answer to early detection 
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because there has only been moderate success with 
immunologic reagents for paraffin-embedded tissue . 
Secondly, lung cancer may express features that cannot be 
differentiated by antibodies; for example, chromosomal 
deletions, gene amplification, or translocation and 
alteration in enzymatic activity . 

After the gene to the MAb recognized surface antigen 
has been cloned, cytogenetic and molecular techniques may 
provide powerful tools for screening, diagnosis, 
management and ultimately treatment of lung cancer . An 
example of a lung cancer antigen that has been cloned is 
the adenocarcinoma-associated antigen . This antigen, 
recognized by KS1/4 MAb, is an epithelial 

malignancy/ epithelial tissue glycoprotein from the human 
lung adenocarcinoma cell line UCLA-P3 . (Strand, et al . , 
"Molecular cloning and characterization of a human 
adenocarcinoma/ epithelial cell surface antigen 
complementary DNA," Cancer Res 1989; 49:314-317). The 
antigen has been found on all adenocarcinoma cells tested 
and in various corresponding normal epithelial cells. 
Northern blot analysis indicated that transcription of the 
adenocarcinoma-associated antigen was detected in RNA 
isolated from normal colon but not in RNA isolated from 

4 

normal lung, prostate, or liver . Therefore identification 
of adenocarcinoma-associated antigen in lung cells may 
prove to be diagnostic for adenocarcinoma „ 

The cloning of CEA and the nonspecific crossreacting 
antigen (NCA) has allowed the development of specific DNA 
probes which discriminate their expression in lung cancer 
at the mRNA level* (Hasegawa, et al., "Nonspecific 
crossreacting antigen (NCA) is a major member of the CEA- 
related gene family expressed in lung cancer, " Br J Cancer 
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1993; 67:58-65) . NCA is a component of the CEA gene 
family in lung cancer and is also recognized by anti-CEA 
antibodies, especially polyclonal antibodies. Because of 
the crossreactivity, investigations to analyze CEA and NCA 
5 separately in lung disease had been difficult. The use of 

DNA probes determined that lung cancer cells fall into 
three different types according to their CEA and/or NCA 
expression by Northern blot analysis. Specifically, lung 
cancers expressed both CEA and NCA mRNA, only NCA mRNA, or 
10 neither mRNA. CEA-related mRNA expression was always 

accompanied by NCA mRNA expression and there were no cases 
of CEA mRNA expression alone. The separate assessment of 
CEA and NCA expression in lung cancers may be important in 
determining the prognosis of lung cancers because the 
15 antigens have been described as cell-cell adhesion 

molecules and may play a role in cancer metastasis. 

Another method to detect the presence of an antigen 
gene or its mRNA in specific cells or to localize an 
antigen gene to a specific locus on a chromosome is in 
20 situ hybridization. m situ hybridization uses nucleic 

acid probes that recognize either repetitive sequences on 
a chromosome or sequences along the whole' chromosome 
length or chromosome segments. By tagging the probes with 
radioisotopes or color detection systems, chromosome 
25 regions can be identified within the cell. Investigations 

using in situ hybridization have demonstrated numerical 
chromosomal abnormalities in samples from human tumors, 
including bladder, neuroectodermal, breast, gastric and 
lung cancer tumors. (Kim, et al., "Interphase cytogenetics 
in paraffin sections of lung tumors by non-isotopic in 
situ hybridization. Mapping Genotype/phenotype 
heterogeneity," Am J Pathol 1993; 142:307-317). 
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Fluorescence in situ hybridization (FISH) allows 
cells to be stained so that genetic aberrations resulting 
in changes in gene copy number or structure can be 
quantitated by fluorescence microscopy. In this 
technique, a chemically labeled single-stranded nucleic 
acid probe homologous to the target nucleic acid sequence 
is annealed to denatured nucleic acid contained in target 
cells. The cells may be mounted on a microscope slide, in 
suspension or prepared from paraffin-embedded material. 
Treating the chemically modified probes with a fluorescent 
ligand makes the bound probe visible. FISH has been used 
for (1) detection of changes in gene copy number and gene 
structure; (2) detection of genetic changes, even in low 
frequency subpopulations ; and (3) detection and 
measurement of the frequency of residual malignant cells. 
(Gray, et al . , "Molecular cytogenetics in human cancer 
diagnosis," Cancer 1992; 69:1536-1542). 

Other molecular markers for lung cancer include 
oncogenes and tumor suppressor genes. Dominant oncogenes 
are activated by mutation and lead to deregulated cellular 
growth. Such genes code for proteins that function as 
growth factors, growth factor receptors, signal 
transducing proteins and nuclear proteins involved in 
transcriptional regulation. Amplification, mutation, and 
translocations have been documented in many different 
cancer cells and have been shown to lead to gene 
activation or overexpression. 

The ras family of oncogenes comprises a group of 
membrane associated GTP-binding proteins thought to be 
involved in signal transduction. Mutations within the ras 
oncogenes, resulting in sustained growth stimulation, have 
been identified in 15 to 30% of human NSCLC. (Birrer, et 
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al., "Application of molecular genetics to the early 
diagnosis and screening of lung cancer, " Cancer 1992; 
52suppl; 2658s-2664s) . Patients with tumors containing 
ras mutations had decreased survival compared with 
patients whose tumors had no ras mutations. Polymerase 
chain reaction (PCR) amplification of ras genes can be 
analyzed to determine the presence of mutations by several 
methods: (a) differential hybridization of 32 P-labeled 
mutated oligonucleotides; (b) identification of new 
restriction enzyme sites created by the activating 
mutation; (c) single-strand conformational polymorphisms; 
and (d) nucleic acid sequencing. These methods combined 
with PCR technology could allow detection of an activated 
ras gene from sputum specimens . 

Another family of dominant oncogenes, the erb B 
family, has been found to be abnormally expressed in lung 
cancer cells. This group codes for membrane-associated 
tyrosine kinase proteins and contains erb Bl, the gene 
coding for the epidermal growth factor (EGF) receptor, and 
erb B2 (also called Her-2/neu) . The erb Bl gene has been 
found to be amplified in NSCLC (up to 20% : of squamous cell 
tumors), while the EGF receptor has been shown to be 
overexpressed in many NSCLC cells (approximately 90% of 
squamous cell tumors, 20 to 75% of adenocarcinomas, and " 
rarely in large cell or undifferentiated tumors) . 
(Birrer, et al., Cancer 1992: 52 suppl; 2658s-2664s) . 
Amplification of the related oncogene erb B2 (Her-2/neu) 
occurs infrequently in lung cancer but is a negative 
prognostic factor in breast cancer. However, 
overexpression of the erb B2 protein product, pl85 neu , 
occurs in some NSCLC and may be related to poor prognosis. 
(Kern, et al . , "pl85 neu expression in human lung 
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adenocarcinomas predicts shortened survival," Cancer Res 
1990; 50:5184-5191) . 

A third family of dominant oncogenes involved in lung 
cancer is the myc family. These genes encode nuclear 
5 phosphoproteins, which have potent effects on cell growth 

and which function as transcriptional regulators. Unlike 
ras genes, which are activated by point mutations in lung 
cancer cells, the myc genes are activated by 
overexpression of the cellular myc genes, either by gene 
10 amplification or by rearrangements, each ultimately 

leading to increased levels of myc protein. Amplification 
of the normal myc genes is seen frequently in SCLC and 
rarely in NSCLC. 

The loss or inactivation of tumor suppressor genes 

15 may also be important steps in the pathway leading to 

invasive cancer. Tumor suppressor genes function normally 
to suppress cellular proliferation, and since they are 
recessive oncogenes, mutations or deletions must occur in 
both alleles of these genes before transformation occurs. 

20 A phosphoprotein p53, which is encoded by a gene 

located on chromosome 17p, suppresse.s transformation in 
its wild-type state. While in its mutant state, p53 acts 
as a dominant oncogene. p53 functions in DNA binding and 
transcription activation. Mutations of p53 have been 

25 found in many human cancers including colon, breast, brain 

and lung cancer cells. (Birrer, et al., Cancer 
Kes.(suppl) 1992, 52 : 2658s-2664s) . In NSCLC cell lines, 
p53 mutations have been found at a rate of up to 7 4%. 
(Mitsudomi, et al . , "p53 gene mutations in non-small-cell 

30 lung cancer cell lines and their correlation with the 



BNSDOCID: <WO 9602552A1 J_> 



(J * 



10 



15 



WO 96/02552 PCTAJS9S/09145 

11 

presence of ras mutations and clinical features, " Oncogene 
1992; 7:171-180) . 

Despite all of the advances made in the area of lung 
cancer, medical and surgical intervention has resulted in 
little change in the 5-year survival rate for lung cancer 
patients. Early detection holds the greatest hope for 
successful intervention. There remains a need for a 
practical method to diagnose lung cancer as close to its 
inception as possible. m order for early detection to be 
feasible, it is important that specific markers be found 
and their sequences elucidated. 

A lung cancer marker antigen, specific for NSCLC, has 
now been found, sequenced, and cloned. The antigen is 
useful in methods for detection of non-small cell lung 
cancer and for potential production of antibodies and 
probes for treatment compositions. 
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BRIEF DESCRIPTION OF THE DRAWING 

FIGURE 1 depicts the alignment of the amino acid 
sequence of HCAVIII with previously described carbonic 
anhydrases. Conserved amino acids are shown in bold. 
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SUMMARY OF THE INVENTION 

The invention concerns a lung cancer antigen 
(HCAVIII) gene specific for non-small cell lung cancer. 

In one embodiment, the invention relates to a 
substantially purified nucleic acid (SEQ ID NO:l) encoding 
the pre-protein sequence shown in SEQ ID NO: 2. 

In other embodiments, the invention relates to cDNAs 
which encode the mature form of the protein (SEQ ID NO; 4), 
or a truncated form of the protein lacking the 
transmembrane domain (SEQ ID NO: 13 and SEQ ID NO: 15), or a 
protein in which one or more of the amino acids in the 
phosphorylation region have been altered- to affect that 
function, an example of which is shown in SEQ ID NO: 18. 

In other embodiments, proteins encoded by the cDNA of 
SEQ ID NO:l, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 12, SEQ 
ID NO: 14, and SEQ ID NO: 17 are provided. 

In another aspect, the invention relates to a 
recombinant DNA clone for HCAVIII. 

In further aspects of the invention, expression 
vectors for HCAVIII and modifications thereof are an 
object. 

the invention further relates to methods of detecting 
lung cancer . 

.In one aspect an in situ hybridization technique is 
provided. In another aspect, a fluorescence in situ 
hybridization technique is provided. In a further aspect, 
an ELISA assay is provided. In another aspect, detection 
of carbonic anhydrase activity which correlates with lung 
cancer antigen is provided. 
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DETAILED DESCRIPTION OF THE INVENTION 

The nucleic acid sequence coding for a cell surface 
protein (said protein hereinafter designated HCAVlii) 
which is highly specific for non-small cell 'lung cancer 
cells has now been obtained. This gene sequence will 
facilitate detection and treatment of the disease, which 
to date has often proven difficult. 

The HCAVIII cDNA in the vector pLC56 has been 
sequenced and characterized including the entire coding 
region and substantially all of the upstream and 
downstream non-translated regions. The cDNA in pLC56 was 
sequenced on both strands from exonuclease Ill-generated 
deletions and subsequent subcloning into M13 vectors or 
directly from the cloning vectors using the di-deoxy 
15 method and a SEQUENASE ® Version 2.0 kit (U.S. 

Biochemicals, Cleveland, OH) . Additional regions of DNA 
were subcloned as small restriction fragments into the 
same vectors for sequence analysis. Overlapping segments 
were ordered using MacVector Align software (Kodak/ IBI 
Technologies, New Haven CT) . SEQ ID NO:l represents the 
cDNA encoding HCAVIII and a presumed signal peptide. SEQ 
ID NO: 2 represents the signal peptide (amino acid residues 
-29 to -1) followed by the mature protein (amino acid 
residues 1 to 325) . As predicted from the cDNA sequence 
in pLC56, a protein of about 354 amino acids is encoded 
with the predictive-size of 39448 daltons. A 
hydrophilicity plot (MacVector software, Kodak/IBI 
Technologies) of this protein provided strong evidence of 
a leader peptide at the N-terminus and a membrane- spanning 
segment near the C-terminus. The membrane-spanning 
segment provides evidence that this protein is membrane 
bound, as also predicted by its positive selection with 
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panning methodology (See Watson, et al., Recombinant DNA, 
2nd ed., pp. 115-116, 1992). The cleavage site of the 
signal as predicted by von Heijne {von Heijne, Gunnar, 
Nucleic Acids Res 1986; 14:4683-4690) is 29 amino acids 
down from the N-terminus methionine. SEQ ID NO: 3 
corresponds approximately to the coding region of the 
mature polypeptide. The subsequent "mature" protein is 
proposed to be 325 amino acids, initiating with serine, 
and of a calculated 36401 daltons and a pi of 6.42 (SEQ ID 
NO : 4 ) . 

Homology searches against NCBI BlastN or BlastX 
version 1.3.12MP (National Center for Biotechnology 
Information, Bethesda, MD) provided evidence the gene and 
protein are novel, not previously identified in either 
database. (Altschul, et al., "Basic local alignment 
search tool," J Mol Biol 1990; 215:403-410). Additional 
searches against another database (Entrez, version 9) gave 
similar results. 

The isolation of a second cDNA encoding HCAVIII 
permitted the identification of new sequences within the 
5' -and 3' -prime untranslated regions of this gene. SEQ ID 
NO: 5, a cDNA encoding HCAVIII and a portion of the 5 1 and 
3 1 nontranslated regions, has substantial identity with 
^l^^^^^™^^W^SSi^PWSl^^P§l^P^^^^^^^^^n^^^r 
to positions 85-1188 of SEQ ID NO: 5) . The encoded protein 
is listed in SEQ ID NO: 6 and is identical with SEQ ID 
NO: 2. Homology searches of NCBI BlastN against SEQ ID 
NO: 5 showed these gene sequences have not been previously 
identified. SEQ ID NO: 7 represents additional cDNA 
sequences of the 3 f nontranslated region of the HCAVIII 
gene located downstream from the sequences depicted in SEQ 
ID NO: 5. Homology searches against the same data base 
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identified two clones with homology to SEQ ID NO: 7. Both 
sequences are expressed sequence tags (EST) , the first 
EST04899 (345 bp) and the second HUMGS04024 (466 bp). 

Alignment searches indicate this protein shares 
common features with the seven human carbonic anhydrase 
proteins previously identified. However, as described 
below, certain structural features distinct to HCAVIII 
exist that may confer unique properties to this protein 
and a role in the transformation pathway to tumorgenicity . 
This group of enzymes catalyze the hydration of carbon 
dioxide 



C0 2 + H 2 0 » hco 3 + H* 

and in reverse the dehydration of HC0 3 ". This protein is 
identified as a carbonic anhydrase (CA) based on the 
conservation of amino acids at positions critical for the 
binding of Zn* 2 , and the catalysis of C0 2 , as well as 
numerous other conserved amino acids (see Fig. l) . The 
protein is 34 to 64 amino acids longer (at the C-terminus) 
than any previously reported carbonic anhydrase by virtue 
of the membrane-spanning region also found in HCAIV and an 
additional approximate 30 amino acids contained in the 
cytoplasmic side of the cell and apparently missing in 
other human CAi so forms. In addition, this intracellular 
domain contains a phosphorylation site recognized by 
protein kinase C and other kinases, as defined by the 
motif "Arg-Arg-Lys-Ser" (SEQ ID NO: 8 and SEQ ID NO: 9) 
(amino acid residues 1-4 in SEQ ID NO: 9 and amino acid 
residues 299-302 in SEQ ID NO: 2, SEQ ID NO: 4 and SEQ ID 
NO : 6 ) . 
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Interestingly, this motif is found only in HCAVIII, 
and at a functionally significant site, i.e., within the 
cytosol. A surface cleft essential for enzymatic function 
present on other carbonic anhydrases is conserved for this 
5 protein, suggesting that this protein will also confer 

enzymatic activity. Five possible N-glycosylation sites 
are predicted by the primary amino acid sequence and the 
motif "Asn-Xaa-Ser (Thr) ", beginning at amino acid 
residues -2, 51, 133, 151, and 202 in SEQ ID NO:2, 
10 respectively. 

HCAVIII is expressed at a much higher level in a non- 
small cell lung cancer cell line (A54 9) than in normal 
lung tissue, other normal tissues, and other tumor cell 
lines which makes it useful in distinguishing this 
disease. This is clearly demonstrated in Table 1. Data 
for this table was obtained as follows. Total cellular 
RNA was isolated from the indicated actively growing cell 
lines as described by Chirgwin, et al., "Isolation of 
biologically active ribonucleic acid from sources enriched 
in ribonuclease, " Biochemistry 1979; 18:5294-5299. RNA 
samples were fractionated over a 1% agarose- formaldehyde 
gel and transferred to a nylon membrane (Qiagen, 
Chatsworth, CA) by capillary action. The hybridization 
tBSSagM^^s^genecafe^ 

restriction fragment isolated from pLC56, a plasmid 
harboring the HCAVIII ge „e in its initial isolation. This 
fragment was radiolabeled with 32 P using a PRIME-IT® 
Random Primer Labeling Kit obtained from Stratagene, La 
Jolla, CA.. a membrane containing RNA derived from healthy 
human tissue was purchased from Clonetech Laboratories, 
Inc., Palo Alto, CA. RNA blots were hybridized in a 
standard cocktail containing 32 P-labeled probe at 42°C 
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overnight then exposed to X-ray film. The same blots were 
subsequently, upon removal of the probe, rehybr idized with 
a second 32 P-labeled DNA from p-actin to serve as a 
positive control for integrity of the blotted RNA. 

As shown in Table 1, normal lung tissue does not 
express the HCAVIII gene in detectable amounts. Other 
tumor cell lines fail to express, or express only in minor 
amounts, which will allow easy distinction of non-small 
cell carcinomas. 
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TABLE 1. NORTHERN BLOTS USING HCAVTII cDNA AGAINST NORMAL 

TISSUES AND TUMOR CELL LINES 

TISSUE mRNA (kB) INTENSITY 
NORMAL TISSUE 

heart nd i 

brain 4 m 5 1X 2 

placenta 4.5 lx 

lung n d 



liver 



nd 



G3 61 (melanoma) n d 

HT14 4 (melanoma) n d 

U937 (histiocytic lymphoma) nd 

KG-l (myelogenous leukemia) nd 



skeletal muscle n d 

kidney 4.5 100x 

pancreas 4.5 10x 



TUMOR CELL LIME 

A54 9 {lung carcinoma) 3.5 5000X 

5 . 4 50X 

8.0 25X 

9.0 25X 

BT20 (breast carcinoma) nd 



1 nd = 

2 IX = 



none detected 

at limit of detection 
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In one embodiment of the invention, probes are made 
corresponding to sequences of the cDNA shown in SEQ ID 
NO: 3, which are complimentary to the mRNA for HCAVIII. 
These probes can be radioactively or non-radioactively 
5 labeled in a number of ways well known to the art. The 

probes can be made of various lengths. Such factors as 
stringency and GC content may influence the desired probe 
length for particular applications. The probes correspond 
to a length of 10-986 nucleotides from SEQ ID NO: 3. The 
10 labeled probes can then be bound to detect the presence or 

absence of mRNA encoding the HCAVIII in biopsy material 
through in situ hybridization. The mRNA is expected to be 
associated with the presence of non-small cell tumors and 
to be a marker for the precancerous condition as well. 
15 Jn -situ hybridization provides a specificity to the 

target tissue that is not obtainable in Northern, PCR or 
other probe-driven technologies. Jn situ hybridization 
permits localization of signal in mixed-tissue specimens 
commonly found in most tumors and is compatible with many 
20 histologic staining procedures. This technique is 

comprised of three basic components: first is the 
preparation of the tissue sample provided by the 
pathologist to permit successful hybridization to the 
probe. Second is the preparation of the hybridization 
25 probe, typically a RNA complementary to the mRNA of the 

gene of interest (i-e., antisense RNA) . RNA probes are 
preferred over DNA probes for in situ hybridizations 
mainly because background hybridization of the probe to 
irrelevant nucleic acids or nonspecific attachment to cell 
30 debris or subcellular organelles can be eliminated with 

RNAse treatment post-hybridization. Third is the 
hybridization and post-hybridization detection. Typically 
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the RNA transcript probe has been radiolabeled by the 
incorporation of 32 P or »S nucleotides to permit 
subsequent detection of the probed specimen by 
autoradiography or quantitation of silver grains following 
treatment with autoradiographic emulsion. Nonradioactive 
detection systems have also been developed. In one 
example, biotinylated nucleotides can be substituted for 
the radioactive nucleotide in the RNA probe preparation, 
permitting visualization of the probed sample by 
immunocytochemistry-derived techniques. Example 1 
describes in situ hybridization procedures using RNA 
probes derived from the HCAVlii gene. Example 2 provides 
exemplary fluorescent in situ (FISH) hybridization 
procedures. 

The cDNA for HCAVIII (SEQ ID NO: 3) is currently in an 
expression vector which is be used to generate the protein 
in E. coli. This expression system described in Example 3 
produces HCAVIII to be used as an antigen for the 
generation of antibodies (Example 4) for use in an ELISA 
assay to detect shed HCAVIII in body fluids as described 
in Example 5. The methods for production of antibodies 
and ELISA type assays are well known in the art! 
Exemplary methods and components of these procedures have 

been chosen and developed and are described in Examples 4- 
25 and 5. 

The expression and purification of foreign proteins 
in E. coli is often problematic. On occasion, the protein 
is expressed at high levels but is deposited within the 
cell as an insoluble, denatured form termed an inclusion 
body. These bodies are often observed when the foreign 
protein contains a hydrophobic domain, such as found in 
the membrane spanning segment of HCAVIII. Through 
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recombinant DNA technology, the DNA sequences encoding the 
membrane spanning segment of HCAVIII are deleted. The 
protein expressed in E. coli from this engineered plasmid 
is now in a soluble and native form within the cell, 
permitting a rapid and less harsh purification. In 
addition, the ELISA test to measure HCAVIII shed into body 
fluids as described in Example 5 relies on the recombinant 
protein produced from E. coll. Typically, the shed 
antigen is a membrane-bound receptor that was released 
from the membrane spanning segment anchoring it to the 
cell. Consequently, the recombinant HCAVIII engineered to 
remove the membrane spanning segment, is a more accurate 
representation of the putative HCAVIII shed antigen found 
in specimens and may prove to be the preferred antigen for 
polyclonal antisera and monoclonal antibody production as 
described for the development of an ELISA test. 

To produce the engineered plasmid, a first plasmid is 
constructed by cleaving pLC56 with the restriction enzyme 
Tthlll I, followed by treatment with T«-DNA polymerase and 
dGTP, dATP, dTTP and dCTP, and finally with alkaline 
phosphatase to remove 5 '-terminal phosphates. The DNA 
sample is then purified by pheno-1/ chloroform extraction 
and ethanol precipitation. The sample is digested with 
the restriction endonuclease BspEl, then the fragments are 
resolved by agarose gel electrophoresis to permit the 
isolation of a 267 base pair fragment. A second plasmid 
described previously for expression of the HCAVIII mature 
protein (SEQ ID NO:4), is cleaved with EcoRI and BspEl 
followed by alkaline phosphatase treatment and 
purification by phenol /chloroform extraction and ethanol 
precipitation. Two oligonucleotides are synthesized, 
being 5 ' -TGAGTCGACG (SEQ ID NO: 10) and 5 ' -AATTCGTCGACTCA 
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(SEQ ID NO: 11), that complement each other and upon 
annealing, provide a termination codon (TGA) and sequence 
complementary to EcoRI cleaved DNA. Finally, the two 
oligonucleotides, the 267 base pair fragment, and the 
BspEI/EcoRI cleaved plasmid will be combined in a ligation 
reaction, and the resultant plasmid which contains the 
truncated DNA sequence (SEQ ID NO: 12) is used to transform 
competent E. coll. Upon expression in E. coll, the 
resulting truncated protein (SEQ ID NO: 13) is 271 amino 
acids as determined by SDS polyacrylamide electrophoresis 
and of a size consistent with other HCA's but lacking the 
membrane spanning segment and the intraceLlular domain. A 
second plasmid encoding a HCAVIII truncated protein (SEQ 
ID NO: 14) lacking the membrane spanning segment and 
intracellular domain was created as described above, 
except that restriction enzyme Pie I was substituted for 
Tthlll I, resulting in a gel purified DNA fragment of 276 
base pairs. Upon expression in E.coli, the resulting 
protein is now 274 amino acids (SEQ ID NO: 15). 

An understanding of protein phosphorylation and its 
role in the mechanism of cell transformation has been 
actively pursued, most notably with tyrosine 
phosphorylation and oncogene activation. The role of 

protein kinases including protein kinase C has been 
studied extensively with respect to signal transduction, 
but its role in oncogenesis is less clear. To provide a 
valuable tool to be used in the study of the role of 
HCAVIII serine phosphorylation in oncogenesis, an altered 
cDNA can be prepared to code for an altered protein. 
Changes to amino acids other than "Gly" may be realized by 
alterations to the oligonucleotide sequence (SEQ ID NO: 16) 
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used to encode the selected residue. Other modifications 
to alter the serine phosphorylation site would utilize the 
described technology to modify either both "Arg" residues 
located within SEQ ID NO: 9 or amino acid residues 299 and 
300 of SEQ ID NO: 2, SEQ ID NO: 4 and SEQ ID NO: 6, Since 
"Arg" residues contain a net positive charge, the 
substituted amino acids would preferably be "Lys" or 
"His," also positively charged amino acids. An exemplary 
plasmid is produced in which the "Ser" codon (amino acid 
residue 4 of SEQ ID NO: 9; amino acid residue 302 in SEQ ID 
NO:2, SEQ ID NO: 4 and SEQ ID NO: 6), is converted to a 
"Gly" codon using an in vitro mutagenesis technique 
described in Example 3 and previously recited in Kunkel, 
Thomas, "Rapid and efficient site-specific mutagenesis 
without phenotypic selection," Proc Natl Acad Sci USA 
1985; 82:488-492, and the oligonucleotide 5 1 - 
CTTTTTTGATACCCTTCCTTCTGAA (SEQ ID NO:16) (located in SEQ 
ID NO:l at the base pairs 1010-1034 with 1022 as the 
mutagen! zed base pair) . The DNA sequences containing the 
HCAVIII gene engineered for production of the mature 
protein and mutagenized codon is released from the 
mutagenesis vector by BamHI and EcoRI restriction 
endonucleases and ligated into pGEX4Tl cleaved with the 
same enzymes, and the resultant plasmid is used to 
transform competent E. coli. The codon mutagenesis is 
confirmed by DNA sequence analysis, and the protein is 
expressed and purified from E. coli as described in 
Example 3. The DNA sequence of the altered plasmid as 
shown in SEQ ID NO: 17 differs from the gene encoding the 
mature protein (SEQ ID NO: 3) in that the nucleotide 1022 
is changed from "A" to "G" , and the protein sequence (SEQ 
ID NO: 18) expressed by the altered plasmid is identical to 
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the mature protein {SEQ ID NO: 4} except that amino acid 
residue 302 is changed from "Ser" to "Gly. " 

Another way to detect the presence of increased 
HCAVTII could be to assay for levels of carbonic anhydrase 
activity in biopsy materials as described in Example 6. 
This should be a useful test as HCAVIII, although it is an 
immunologically unique molecule, contains small but 
distinct regions which are conserved between previously 
reported carbonic anhydrase proteins. 

In another embodiment of the invention, primers are 
made complimentary to the HCAVIII cDNA (SEQ ID NO: 3) for 
detecting expression of the gene. PCR amplification of 
cDNA from lung biopsy cells would indicate the presence of 
the same non-small cell lung carcinoma. 

Due to the non-small cell lung cancer specificity of 
HCAVIII and the gene encoding the protein, antibodies 
specific for HCAVIII would also exhibit non-small cell 
lung cancer specificity which can be employed for 
diagnostic detection of HCAVIII in body fluids such as 
serum or urine or HCAVIII containing cells. Targeting of 
cancer therapeutic drugs to HCAVIII containing cells can 
also be developed using HCAVIII specific antibodies. The 
genetic expression of the gene encoding HCAVIII could be 
modulated by drugs or anti-sense technology resulting in- 
an alteration of the cancer state of the HCAVIII 
containing cells. 

Example 1 

In Situ Hybridization using RNA Probes 
Derived from the HCAVIII Gene 
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Tissue samples are treated with 4% paraformaldehyde 
(or equivalent fixative), dehydrated in sequential ethanol 
solutions of increasing concentrations (e.g., 70%, 95% and 
100%) with a final xylene incubation (see Current 
5 Protocols in Molecular Biology, pp. 14.01-14.3 and 

Immunocytochemistry II:iBRO Handbook Series: Methods in 
the Neurosciences Vol 14; pp 281-300, incorporated herein 
by reference) . The tissue is embedded in molten paraffin, 
molded in a casting block and can be stored at room 
10 temperature. Tissue slices, typically 8 urn thick, are 

prepared with a microtome, dried onto gelatin-treated 
glass slides and stored at -20°c. 

DNA sequences from the HCAVIII gene (SEQ ID N0:3) are. 
subcloned into a plasmid engineered for production of RNA 
15 probes. In this example, a 776 bp DNA fragment is 

released from a pLC56 plasmid following BamHI/AccI 
digestion, where the BamHI site has been created by in 
vitro mutagenesis (see E. coli expression below) . This 
fragment is ligated into P GEM-2 (Promega Biotec, Madison, 
20 wi) that was cleaved with BamHI and AccI and transformed 

into competent E. coli. This constructed plasmid contains 
the T7 RNA polymerase promoter downstream of the AccI 
restriction site and hence can drive transcription of the 
antisense HCAVIII sequences defined by the BamHI/Acci 
25 fragment. Following linearization of the subsequent 

plasmid with BamHI," an in vitro transcription reaction 
composed of transcription buffer (40 mM Tris-HCl, pH 7.5, 
6 mM MgCl 2 , 2 mM spermidine, 10 mM NaCl, 10 mM 
dithiothreitol, 1 u/ul ribonuclease inhibitor), linearized 
JO plasmid, 10 mM GTP, 10 mM ATP, 10 mM CTP, 100 uCi of 

( 35 S)UTP, and T7 RNA polymerase is incubated at 37°C. 
Multiple RNA copies of the gene are produced that then are 
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used as a hybridization probe. The reaction is terminated 
by the addition of DNAase, and the synthesized RNA is 
recovered from unincorporated nucleotides by 
phenol/chloroform extraction and sequential ethanol 
precipitations in the presence of 2.5 M ammonium acetate. 

The slides containing fixed, sectioned tissues are 
rehydrated in decreasing concentration of ethanol (100%, 
70% and 50%), followed by sequential treatments with 0.2 N 
HC1, 2X SSC (where 20X SSC is 3 M NaCl and 0.3 M sodium 
citrate) at 70°C to deparaf f inate the sample , phosphate 
buffered saline (PBS), fixation in 4% paraformaldehyde and 
PBS wash. The slides are blocked to prevent nonspecific 
binding by the sequential additions of PBS/lOmM 
dithiothreitol (45°C) , 10 mM dithiothreitol/0 . 19% 
iodoacetamide/0. 12% N-ethylmaleimide and PBS wash. The 
slides are equilibrated in 0 . 1M triethylamine, pH 8.0, 
followed by treatment in 0.1M triethylamine/0 . 25% acetic 
anhydride and 0.1 M triethylamine/0 . 5% acetic anhydride 
and washed in 2X SSC. The slides are then dehydrated in 
increasing concentrations of ethanol (50%, 70% and 100%) 
and stored at -80°C. 

A hybridization mix is prepared by combining 50% 
deionized formamide, 0.3 M NaCl, 10 mM Tris-HCl, pH 8.0, 1 
mM EDTA, IX Denhardt ' s solution (0.02% Ficoll 400, 0.02% 
polyvinylpyrrolidone, 0.02% bovine serum albumin (BSA) ) , 
500 ug/ml yeast tRNA, 500 ug/ml poly (A), 50 mM 
dithiothreitol, 10% polyethyleneglycol 6000 and the 35 S- 
labeled RNA probe. This solution is placed on the fixed, 
blocked tissue slides which are then incubated at 45°C in 
a moist chamber for 0.5 to 3 hours . The slides are washed 
to remove unbound probe in 50% formamide, 2X SSC, 20 mM 2- 
mercaptoethanol (55°C), followed by 50% formamide, 2X SSC, 
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20 mM 2-mercaptoethanol and 0.5% Triton-X 100 (50°C) and 
finally in 2X SSC/20 mM 2-mercaptoethanol (room 
temperature) . The slides are treated with 10 mM Tris-HCl, 
pH 8.0/0.3 M NaCl/40 ug/ml RNase A/2 ug/ml RNAse Tl (37°C) 
5 to reduce levels of unbound RNA probe. Following RNAse 

treatment, the slides are washed in f ormamide/SSC buffers 
at 50°C / room temperature and then dehydrated in 
increasing ethanol concentrations containing 0.3 M 
ammonium acetate, and one final 100% ethanol wash. The 
10 slides are then exposed to X-ray film followed by emulsion 

autoradiography to detect silver grains. 

Test tissue samples are compared to matched controls 
derived from normal lung tissue. Evidence of elevated 
transcription of the HCAVIII gene in test tissue compared 
15 to normal tissue, as determined by autoradiography (X-ray 

film) or alternatively by the quantitation of silver 
grains following emulsion autoradiography would provide 
evidence of a positive diagnosis for lung cancer. 

20 Example 2 

Fluorescent In Situ Hybridization (FISH) Using DNA Probes 

Derived from the HCAVIII Gene 

A genomic clone to the HCAVIII gene (SEQ ID N0:1) is 
isolated using a PCR primer pair which have been 
25 identified from the pLC56 cDNA sequence. This- primer pair 

is located in putative exon 6 of the pLC56 gene, and they 
are identified as Probe Exon 6A ( 5 1 -ACATTGAAGAGCTGCTTCCGG- 
3'; SEQ ID NO: 19) and Probe Exon 6B (5'- 

AATTTGCACGGGGTTTCGG-3 * ; SEQ ID NO: 20) . The genomic clone 
30 of HCAVIII is then identified as a PCR product of about 

119 bp using this primer pair from the designated genomic 
clone. This result is confirmed by Southern blotting and 
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DNA sequence analysis. A sequence of 1363 bp derived from 
the HCAVIII genomic clone is reported in SEQ ID NO: 21. 
This, sequence is located directly before the HCAVIII cDNA 
and constitutes the putative promoter of this gene and 
5 likely contains transcription regulatory elements directly 

implicated in HCAVIII expression. 

The DNA probe comprising the genomic clone of HCAVIII 
plus flanking sequences is labeled in a random primer 
reaction with digoxigenin-ll-dUTP (Boehringer Mannheim 
10 Biochemicals, Indianapolis, IN) by combining the DNA with 

dNTP(-TTP, final 0.05 mM), digoxigenin-ll-dUTP/dTTP 
(0.0125 mM and 0.0375 mM, final), 10 mM 2-mercaptoethanol, 
50 mM Tris-HCl, pH 7.5, 10 mM MgCl 2 , 20 U of DNA 
polymerase I and 1 ng/ml DNAase. The reaction is 
15 incubated at 15°C for two hours, and then terminated by 

adding EDTA to a final concentration of 10 mM. The 
labeled DNA probe is further purified by gel filtration 
chromatography. It is apparent to those skilled in the 
art that other suitable substrates such as biotin-1 1-dUTP 
can be substituted for digoxigenin-ll-dUTP in the 
procedure above . 

A hybridization mix is prepared by combining 50% 
deionized formamide, 0.3 M NaCl, 10 mM Tris-HCl, pH 8.0, 1 

25 polyvinylpyrrolidone, and 0.02% bovine serum albumin), 500 

Mg/ml yeast tRNA, 500 ^tq/ml poly (A), 50 mM dithiothrei tol , 
10% polyethyleneglycol 6000, and the labeled DNA probe. 

Single cell suspensions of tissue biopsy material or 
normal tissue are fixed in methanol/glacial acetic acid 

30 . < 3:1 vol/vol) and dropped onto microscope slides. 

(Aanastasi, et al., "Detection of Trisomy 12 in chronic 
lymphocytic leukemia by fluorescence in situ hybridization 
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to interphase cells: a simple and sensitive method," Blood 
1992; 77:2456-2462). After the slides are heated" for 1-2 
hours at 60°C / the hybridization mix is applied to the 
slides which are then incubated at 45°C in a moist chamber 
5 for 0.5-3 hours. After incubation, the slides are washed 

three times with a solution comprising 50% formamide and 
2X SSC at 42°C, washed twice in 2X SSC at 42°C, and 
finally washed in 4X SSC at room temperature. The slide 
is blocked with a solution of 4X SSC and 1% BSA, and then 
10 washed with a solution of 4X SSC and 1% Triton X-100. 

The hybridized digoxigenxn-labeled probe is detected 
by adding a mixture of sheep anti-digoxigenin antibody 
(Boehringer Mannheim) diluted in 0.1 M sodium phosphate, 
pH 8.0, 5% nonfat dry milk, and 0.02% sodium azide, 
15 followed by the addition of f luorescein-conjugated rabbit 

anti-sheep IG for detection. The slides are then washed 
in PBS, mounted in Vectashield (Vector Laboratories, Inc., 
Burlingame, CA) , and viewed by fluorescent microscopy. 

Hybridization signals are enumerated in tumor derived 
20 tissue and then compared to normal tissue. Normal tissue 

displays two distinct hybridization signal characteristics 
of a diploid state. Enumeration ovfer the "rate of two 
hybridization signals/cell is considered significant. 

e 3 



25 Expression of HCAVIII 

Expression of foreign proteins is often performed in 
E. coli when an immunogen or large amounts of protein are 
desired, as in the development of a diagnostic kit, A 
preferred system for E. coli expression has been described 
30 (Smith, et al., "Single-step purification of polypeptides 

expressed in Escherichia coli as fusions with glutathione- 
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s-transf erase, " Gene 1988; 67:31-40) whereby glutathione 
transferase is expressed with amino acids representing the 
cloned protein of interest attached to the carboxyl- 
terminus. The fusion protein can then be purified via 
affinity chromatography and the protein of interest fused 
to glutathione transferase released by digestion with the 
protease thrombin or alternatively the fusion protein is 
released intact from the affinity column by competing 
levels of free glutathione. 

To express the HCAVIII protein {SEQ ID NO: 4) of this 
invention in E. coli using the above described technology, 
an expression plasmid was produced fused to the 
glutathione transferase gene in frame with the HCAVIII 
gene (SEQ ID NO:l) to produce a fusion protein. The 
fusion gene/expression plasmid was assembled from nucleic 
acids derived from the following sources. First, the 
expression plasmid pGEX4Tl (Pharmacia, Piscataway, NJ) was 
cleaved in the polycloning region with the restriction 
endonucleases BamHI and EcoRI to permit insertion of the 
HCAVIII gene. Second, an oligonucleotide was synthesized, 
being 5 ' -GTCCACTTGGATCCGTTCACTGG-3 • (SEQ ID NO: 22) . Using 
the in vitro mutagenesis procedure described by Kunkel 
iProc Natl Acad Sci USA 1985; 82:488-492) and the above 
oligonucleotide, a BamHI restriction site was created 
without altering the amino acid codons of the original 
protein. In addition the created BamHI site was situated 
in correct reading frame and proximity to the predicted 
cleavage site separating the signal peptide from the 
mature protein. The DNA sequences encoding the mature 
protein were released from the mutagenesis vector as a 
BamHI/EcoRI fragment, where the EcoRI site originates from 
a polycloning region of the DNA sequencing vector pUC19 
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found downstream of the HCAVIH gene. The DNA fragments 
described above comprised of pGEX4T-l cleaved at BamHI and 
EcoRI and the HCAVIII gene released as a BamHI/EcoRI 
fragment was combined in a mixture composed of IX T« 
ligase buffer (50 mM Tris-HCl, 10 mM MgCl 2 , 20 mM 
dithiothreitol, 1 mM ATP, 50 ug/ml BSA, final pH 7.5) and 
T 4 DNA ligase (New England Biolabs, Beverly, MA) . The 
ligated DNA was used to transform a suitable strain of E. 
coli such as XL-1 Blue (Stratagene) . The recovered 
plasmid is sequenced to confirm the expected DNA sequence. 
Protein expression is induced in E. coli with the chemical 
isopropyl 3-thiogalactoside, and the. fusion protein is 
released by cell lysis, followed by denaturation and 
resolubilization of the fusion protein with 8 M urea/ 20 
mM Tris.Cl (pH 8.5) /10 mM dithiothreitol, dialysis and 
protein renaturation, and finally binding to an affinity 
column composed of glutathione-agarose (Sigma, St. Louis, 
MO) and cleavage with thrombin to release the HCAVIII 
protein. The resulting protein is suitable as an 
immunogen for polyclonal or monoclonal antibody production 
and for usage in an ELISA kit as a internal standard and 
positive control. Carbonic anhydrase enzyme activity (as 
described in Example 6) was measured for E.coli-derived 
HCAVIII and HCAVIII- truncated form (SEQ ID NO: 15) and 
compared to commercially obtained human carbonic anhydrase 
II (Sigma, St. Louis, Mo.). The activity, as reported in 
Enzyme Unit (U) /mg, for human carbonic anhydrase II was 
3571 u/mg, for HCAVIII was 274 U/mg and HCAVIII truncated 
form was 2632 U/mg. These results indicated an 
enzymatically active and renaturable HCAVIII derived from 
E.coli of comparable enzymatic activity to human carbonic 
anhydrase II was obtained. 
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The length of the resulting protein can be varied by 
altering the length of SEQ ID NO:l prior to insertion into 
the expression plasmid, or by cleavage of amino acids from 
the protein resulting in the above example. Structure/ 
5 function studies of other HCA's suggest modifications (as 

defined by deletions at the N- terminal and Oterminal) 
more extensive than disclosed in SEQ ID NO: 12 would still 
permit the production and use of a protein as an immunogen 
or standard, these deletions being a protein defined by 

10 about amino acid residue 3 to amino acid residue 259 in 

SEQ ID NO: 12. Using existing technology one could 
synthesize a peptide of approximately 10 to 40 amino acids 
in length that comprises a structural domain of HCAVIII. 
This synthesized peptide, coupled to a carrier protein, 

15 could be used for generating polyclonal antisera specific 

for native HCAVIII* 

Example 4 
Production of Antibodies to HCAVTII 

The production of polyclonal antisera is described in 
20 great detail in Harlow, et al., Antibodies: A Laboratory 

Manual, Cold Spring Harbor Laboratories, New York, 1988 
incorporated herein by reference. The HCAVIII protein 
(SEQ IP NO: 4) in the presence of an adjuvant is injected - 
into rabbits with a series of booster shots as a 
25 prescribed schedule optical for high titers of antibody in 

serum. A total of seven biweekly bleeds were obtained 
from two rabbits immunized with HCAVIII truncated protein 
(SEQ ID NO: 15). The resulting anti-HCAVIII serum titer 
was compared to preimmune sera of the same rabbits and 
30 determined to be 1000 to 2000-fold greater, hence suitable 

as a reagent for indirect EL ISA (Example 5) . Rabbit 
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antibody was partially purified by precipitation with 
ammonium sulfate (50%, final) followed by dialysis and 
fractionation by preparative DEAE-HPLC. 

An extensive description for producing monoclonal 
antibodies derived from the spleen B cells of an immunized 
mouse and a immortalized myeloma cell is found in the 
above reference for polyclonal antisera production. Mice 
are immunized with either the purified HCAVIII protein or 
a glutathione/HCAVIII fusion protein. Following cell 
fusion, selection for hybrid cells and subcloning, 
hybridomas are screened for a positive antibody against 
whole A549 cells or purified HCAVIII protein using an 
indirect ELISA assay as described for the ELISA kit (see 
Example 5) . 

Example 5 
ELISA Assay of Shed HCAVIII 

An indirect ELISA screening assay for HCAVIII protein 
(SEQ ID NO: 4) has been designed to detect and monitor the 
HCAVIII protein in body fluids including but not limited 
to serum and other biological fluids such as sputum or 
bronchial effluxion at effective levels necessary for 
sensitive but accurate determinations. It is intended to 
aid in the early diagnosis of non-small cell lung cancer, 
for which there currently is no effective treatment. An 
early-detection, accurate, non-invasive assay for non- 
small cell lung cancer would be of great benefit in the 
management of this disease. 

The immunochemicals used in this procedure were 
rabbit anti-human HCAVIII antibody (purified IgG, IgM) 
produced according to the procedure given in Example 4, 
mouse anti-human HCAVIII (monoclonal) also produced 
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according to the procedure given in Example 4, and goat 
anti-Rabbit IgG/peroxidase conjugate. The HCAVIII protein 
standard and internal positive control were produced as 
described in Example 3 for expression in £. coli. 

Substrate components include 1 M H 2 SO« stored at room 
temperature and 3 • , 5, 5 • - tetramethylbenzidine (TMB) (Sigma 
Chemical Co.) used as a peroxidase substrate and stored at 
room temperature in the dark to prevent exposure to light. 

Several buffers, diluents, and blocking agents were 
used in the procedure. Note that no sodium azide 
preservative was used in any of the buffers. This was 
done to avoid any possible interference from the azide 
with the peroxidase conjugate. 

Phosphate buffered saline (PBS) was prepared by 
adding 32.0 g sodium chloride, 0.8 g potassium phosphate, 
monobasic, 0.8 g potassium chloride, and 4.6 g sodium 
phosphate, dibasic, anhydrous, to 3.2 L deionized water 
and mixing to dissolve. After bringing the solution to .4 
L with deionized water and mixing, the pH was about 7.2. 
The buffer can be stored at 4°C for a maximum of 3 weeks. 

Two bovine serum albumin solutions (BSA) were 
utilized as diluents. A 1% BSA solution in PBS, utilized 
as the second antibody/ conjugate diluent, was prepared by 

Co.) to 80 ml of PBS, allowing it to stand as it slowly 
goes into solution, adding PBS to a final volume of 100 
ml, and then mixing. This diluent can be stored at 4°C 
for a maximum of 2 weeks; however if the solution becomes 
turbid, it should be discarded. As a diluent for the 
standards and samples, a 0.025% BSA solution in PBS was 
prepared fresh for each assay by diluting the 1% BSA 
diluent with PBS 1:40 (vol/vol). 
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A borate blocking buffer (0.17 M H 3 B0 3 , 0.12 M NaCl, 
0.05% Tween 20, lmM EDTA and 0.25% BSA was also used. 

The substrate buffer was phosphate-citrate/sodium per 
borate (Sigma, St. Louis, Mo.). 

All assays were performed in Immulon IV plates 
(Dynatech, Chantilly, VA #011-010-6301). The assay plates 
were coated with a monoclonal antibody against HCAVIII by 
adding 50 ul of a 10 ug/ml solution of antibody in PBS to 
each well of Immulon IV plates. The plates were covered " 
and incubated overnight at room temperature. The antibody 
solution was removed and the wells rinsed three times with 
deionized water. Three-hundred microliters (300 ul) of 
the borate blocking buffer was added to'each well and 
incubated at room temperature for thirty minutes. The 
buffer was removed, the wells rinsed three times with 
deionized water, and the plates air dried. The plates 
were then wrapped and stored at 4°C. 

The standard E.coli-derived HCAVIII truncated protein 
(SEQ ID NO:15), was diluted to 32" ng/ml in PBS/0.025% BSA 
and two-fold serial dilutions were made in same. The 
samples were also diluted in PBS/0.025% BSA and 50 ul of 
standard or sample was applied to each well. The plates 
were incubated overnight, covered, at room temperature. 

The standard and sample solutions were removed from 
the wells and the wells were rinsed three times with 
deionized water. Three-hundred microliters (300 ul) 
borate blocking buffer was added to each well and 
incubated at room temperature for thirty minutes. The 
plates were rinsed again with deionized water and tapped 
(inverted) on paper towels to remove excess water. The 
second antibody rabbit antisera to HCAVIII truncated 
protein (SEQ ID NO: 15), was diluted to 1 ug/ml in PBS/1% 
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BSA and 50 ul was added to each well. The plates were 
covered and incubated at room temperature two hours. 

The antibody solution was removed from the wells 
which were then rinsed with deionized water three times. 
They were then blocked for ten minutes at room temperature 
with borate blocking buffer, rinsed again with deionied 
water three times, and tapped on paper towels. The 
antibody conjugate, goat F<ab')2 x rabbit IgG & IgL-HPRO 
(Tago, Camarillo, CA. ) was diluted 1:16,000 in PBS/1%BSA 
and 50 ul was added to each well. The plates were covered 
and incubated at room temperature two hours. 

The antibody conjugate solution was removed from the 
wells and they were rinsed with deionized water three 
times, blocked with three-hundred ul borate buffer at room 
15 temperature then minutes, rinsed three times with 

deionized water, and tapped on paper towels. The 
substrate was prepared no more than fifteen minues before 
use by dissolving one capsule of phosphate-citrate/sodium 
perborate {Signma, St. Louis, Mo.) in 100 ml water. For 
20 each plate, one tablet of TMB was added to 10 ml of the 

phosphate-citrate/sodium perborate buffer and syringe 
filtered. One-hundred ul was added to each well and the 
plates were covered and incubated at room temperature in 
the dark for one hour. The reaction was stopped by adding 
25 50 ul of 1M H 2 S0 4 to each well. The plates were read on a 

Molecular Devices microplate reader at 450nm. Under these 
conditions, a linear response was obtained from 0.5 to 32 
ng/ml using HCAVIII truncated protein as a standard, with 
the assay sensitivity at 0.5 ng/ml. No cross-reaction was 
30 observed against HCAII, an abundant carbonic anhydrase in 

human serum. 
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Example 6 

Carbonic Anhydrase (CA) Activity of Biopsy Tissue 

Ice cold solutions of ITB {20 mM imidazole, 5 mM 
Tris, and 0.4 mM para-nitrophenol, P H 9.4-9.9) and Buffer 
A (25 mM triethanolamine, 59 mM H 2 SO«, and 1 mM 
benzamidine HC1) are prepared. 

A homogenate is prepared by scraping with a cell 
scraper into 1-2 ml of Buffer A a monolayer of tissue 
cells cultured from a tissue sample taken from a biopsy. 
A- portion of the sample is then boiled to inactivate CA. 

A tube is placed in an ice water bath. For the 
macroassay, a 10 x 75 mm glass tubes. and rubber stopper 
with 16 gauge and 18 gauge needle ports is used; for the 
microassay, a 6 x 50 mm glass tubes and rubber stopper 
with 18 gauge needle port and 20 gauge needle with 
attached PE90 tubing. The sample is added and along with 
ice cold water to a final volume of 500 pi for macroassay 
or 50 pi for microassay. 500 pi (macro) or 50 pi (micro) 
ice cold water is used for a water control. 10 pi 
antifoam (A. H. Thomas, Philadelphia, PA) is added to the 
tube which is then incubated in ice water for 0.5 to 3 
minutes . 

The tube is capped with a stopper and C0 2 at 150 
ml/min (macro) or 100 ml/min (micro) is bubbled through 
25 the smaller needle port for 30 sec. 

50 pi (macro) or 50 pi (micro) of the ITB solution is 
rapidly added through the larger needle port with a cold 
Hamilton syringe. The sample becomes yellow. 

Using a timer or stopwatch, the time at which the 
30 solution in the tube becomes colorless is measured and 

recorded. The tube may be momentarily removed from the 
bath and held in front of a white background to determine 
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the color change. Comparison to a previously acidified 
sample may be used. 

The procedure is repeated with the boiled sample. 
The volume of sample that corresponds to approximately one 
enzyme unit is determined using the formula below. 

Volume (1EU) = V EU = volume used x log2/log (boiled 
time/activated time) One enzyme unit is the activity that 
halves the boiled control time. 

The assay is repeated 1-3 times with the sample and 
boiled sample, using the adjusted volume of sample. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Cytoclonal Pharmaceutics, Inc. 

(B) STREET: 9000 Harry Hines Blvd, Suite 330 
<C) CITY: Dallas 

(D) STATE: Texas 

(E ) COUNTRY : USA 

(F) POSTAL CODE (ZIP) : 75235 

(G) TELEPHONE: (214) 353-2923 

(H) TELEFAX: (214) 350-9514 

(I) TELEX: 



(ii) TITLE OF INVENTION: Lung Cancer Marker 
(iii) NUMBER OF SEQUENCES: 22 

(XV) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: RICHARDS, MEDLOCK & ANDREWS 

(B) STREET: 1201 Elm Street, Suite 4500 

(C) CITY: Dallas 

(D) STATE: TX 

( E ) COUNTRY : US 

(F) ZIP: 75270-2197 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS- DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: John A. Harre 

(B) REGISTRATION NUMBER: 37,345 

(C) REFERENCE/ DOCKET NUMBER: B35792CIPPCT 

fix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 214-939-4500 

(B) TELEFAX: 214-939-4600 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1104 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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fix) FEATURE: 

(A) NAME/KEY: CDS 

<B) LOCATION: 32.. 1093 

(ix) FEATURE: 

(A) NAME/KEY: mat^ peptide 

(B) LOCATION: 119.. 1093 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 1013.. 1024 

(D) OTHER INFORMATION; /note= "phosphorylation site 
recognized by protein kinase C and other kina . . . " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

GCCCGCGCCC GCCCCGCAGG AGCCCGCGAA G ATG CCC CGG CGC AGC CTG CAC 52 

Met Pro Arg Arg Ser Leu His 
-29 -25 

GCG GCG GCC GTG CTC CTG CTG GTG ATC TTA AAG GAA CAG CCT TCC AGC 100 
Ala Ala Ala Val Leu Leu Leu Val lie Leu Lys Glu Gin Pro Ser Ser 
~ 20 -15 - 10 

CCG GCC CCA GTG AAC GGT TCC AAG TGG ACT TAT TTT GGT CCT GAT GGG 148 
Pro Ala Pro Val Asn Gly Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly 
" 5 1 5 10 

GAG AAT AGC TGG TCC AAG AAG TAC CCG TCG TGT GGG GGC CTG CTG CAG 196 
Glu Asn ser Trp Ser Lys Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin 

" 20 25 

TCC CCC ATA GAC CTG CAC AGT GAC ATC CTC CAG TAT GAC GCC AGC CTC 244 
Ser Pro He Asp Leu His Ser Asp He Leu Gin Tyr Asp Ala Ser Leu 

30 35 40 

£w fI C "5 5^ 5?° I AC ^ T CTG TCT GCC AAC AAG CAG TTT 292 

isn 
55 



~w ZT r 11U TAC AAT CTG TCT GCC AAC AAG CAG TTT 

Thr Pro Leu Glu Phe Gin Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe 
45 50 55 

CTC CTG ACC AAC AAT GGC CAT TCA GTG AAG CTG AAC CTG CCC TCG GAC 340 
Leu Leu Thr Asn Asn Gly His Ser Val Lys Leu Asn Leu Pro Ser Asp 
60 65 7n y 



388 



ATG CAC ATC CAG GGC CTC CAG TCT CGC TAC AGT GCC ACG CAG CTG CAC 

Met His He Gin Gly Leu Gin Ser Arg Tyr Ser Ala Thr Gin Leu His 
75 80 8 5 9Q 

CTG CAC TGG GGG AAC CCG AAT GAC CCG CAC GGC TCT GAG CAC ACC GTC 436 
Leu His Trp Gly Asn Pro Asn Asp Pro His Gly Ser Glu His Thr Val 

95 100 10 5 

^ ^ ^ ^ C " C G f C 000 GAG CTG <=* C ATT GTC CAT TAT AAC TCA 
Ser Gly Gin Hxs Phe Ala Ala Glu Leu His He Val His Tyr Asn Ser 

110 115 12 o 



484 
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SAC CTT TAT CCT GAC GCC AGC ACT GCC AGC AAC AAG TCA GAA GGC CTC 
Asp Leu Tyr Pro Asp Ala Set Thr Ala Ser Asn Lys Ser Glu Gly Leu 
125 130 * 



135 



GCT GTC CTG GCT GTT CTC ATT GAG ATG GGC TCC TTC AAT CCG TCC TAT 
Ala yal Leu Ala Val Leu lie Glu Met Gly Ser Phe Asn Pro Ser Tyr 
140 145 150 

GAC AAG ATC TTC AST CAC CTT CAA CAT GTA AAG TAC AAA GGC CAG GAA 
Asp Lys lie Phe Ser His Leu Gin His Val Lys Tyr Lys Gly Gin Glu 

160 165 170 

aT* III vT? p CG ^ ll C AAC ATT °** GAG CTG CTT CCG GAG AGG ACC 
Ala Phe Val Pro Gly Phe Asn He Glu Glu Leu Leu Pro Glu Arg Thr 

175 180 185 

GCT GAA TAT TAC CGC TAC CGG GGG TCC CTG ACC ACA CCC CCT TGC AAC 
Ala Glu Tyr Tyr Arg Tyr Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn 

190 195 200 

CCC ACT GTG CTC TGG ACA GTT TTC CGA AAC CCC GTG CAA ATT TCC CAG 
Pro Thr Val Leu Trp Thr Val Phe Arg Asn Pro Val Gin lie IZr 
205 210 - 215 

«u ^ f TG f TG ft* TTG GAG ACA GCC CTG TAC TGC ACA CAC ATG GAC 
Glu Gin Leu Leu Ala Leu Glu Thr Ala Leu Tyr Cys Thr His Met Asp 
•" w 225 230 

GAC CCT TCC CCC AGA GAA ATG ATC AAC AAC TTC CGG CAG GTC CAG AAG 
Asp Pro ser Pro Arg Glu Met He Asn Asn Phe Arg Gin Val Gin Lys 
235 240 245 250 

11° ^ T 5*° AGG CTG GTA TAC *CC TCC TTC TCC CAA GTG CAA GTC TGT 
Phe Asp Glu Arg Leu Val Tyr Thr Ser Phe Ser Gin Val Gin Val Cys 

255 260 265 

?hr f ff* ^ CTG AGT CTG 000 ATC ATC CTC TCA CTG GCC CTG GCT 
Thr Ala Ala Gly Leu Ser Leu Gly He He Leu Ser Leu Ala Leu Ala 

270 275 2B0 

GGC ATT CTT GGC ATC TGT ATT GTG GTG GTG GTG TCC ATT TGG CTT TTC 

295 

AGA a™ AGT A Tf **** AAA GGT GAT AAC AAG GGA GTC ATT TAC AAG 

Gly 
310 

CCA GCC ACC AAG ATG GAG ACT GAG GCC CAC GCT TGAGGTCCCC G 
Pro Ala Thr Lys Met Glu Thr Glu Ala His Ala 
315 320 3 25 



t t IT C\ T " T GTG GTG GTG GTG TCC ATT TGG CTT TTC 

Gly He Leu Gly He Cys He Val Val Val Val Ser He Trp Leu Phe 

285 290 

~ ™~ AAG AGT ATC AAA AAA GGT GAT AAC AAG GGA GTC ATT TAC AAG 

5 ™2 LyS Ser Ile Lys Lys GX * *** ** n L y* Gly Val He Tyr Lys 

300 305 



(2) INFORMATION FOR SEQ ID NO; 2: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 354 amino acids 

(B) TYPE: amino acid 
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(D) TOPOLOGY : linear 

<ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Pro Arg Arg Ser Leu His Ala Ala Ala Val Leu Leu Leu Val lie 
"29 -25 -20 -15 

Leu Lys Glu Gin Pro Ser Ser Pro Ala Pro Val Asn Gly Ser Lys Trp 

-10 -5 1 

Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys Lys Tyr Pro 
5 10 is 

Ser Cys Gly Gly Leu Leu Gin Ser Pro lie Asp Leu His Ser Asp He 
20 25 30 35 

Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin Gly Tyr Asn 

40 45 5 0 

Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly His Ser Val 

55 60 65 

Lys Leu Asn Leu Pro Ser Asp Met His He Gin Gly Leu Gin Ser Arg 
7 ° 75 60 

Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro Asn Asp Pro 
85 90 95 

His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala Ala Glu Leu 
100 105 no H5 

His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala Ser Thr Ala 

120 125 130 

Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu He Glu Met 

135 140 145 

Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His Leu Gin His 
150 155 160 

Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe Asn He Glu 
l fi 5 , 170 175 

Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr Arg Gly Ser 
180 185 .190 195 

Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr Val Phe Arg 

200 205 210 

Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu Glu Thr Ala 

215 220 225 

Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu Met He Asn 
230 235 240 
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Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val Tyr Thr Ser 
245 250 255 

Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser Leu Gly He 
260 265 270 * 275 

He Leu Ser Leu Ala Leu Ala Gly He Leu Gly He Cys He Val Val 

280 265 290 

val Val Ser He Trp Leu Phe Arg Arg Lys Ser He Lys Lys Gly Asp 

2Sb 300 305 

Asn Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu Thr Glu Ala 
310 315 320 

His Ala 

325 

(2) INFORMATION FOR SEQ ID NO: 3: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 986 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY : CDS 

(B) LOCATION: 1..975 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 695.. 906 

(D) OTHER INFORMATION: /note* "phosphorylation site 
recognized by protein C kinase and other kina !T« 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



Ser ^s I™ th T l M IP T CCT 600 GAG ** T AGC TGG TCC AAG 

Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys 

AAG TAC CCG TCG TGT GGG GGC CTG CTG CAG TCC CCC ATA GAC CTG CAC 
Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin Sei Pro JK Sp S £s 

20 25 30 

ACT GAC ATC CTC CAG TAT GAC GCC AGC CTC ACG CCC CTC GAG TTC CAA 
Ser Asp He Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe 
35 40 45 

GGC TAC AAT CTG TCT GCC AAC AAG CAG TTT CTC CTG ACC AAC AAT GGT 

Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Su J£ 2n Asl 

55 60 



48 



96 



144 



192 
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CAT TCA GTG AAG CTG AAC CTG CCC TCG GAC ATG CAC ATC CAG GGC CTC 2 40 

His Ser val Lys Leu Asn Leu Pro Ser Asp Met His lie Gin Gly Leu 
65 70 75 80 

CAG TCT CGC TAC AGT GCC ACG CAG CTG CAC CTG CAC TGG GGG AAC CCG 288 
GXn Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 

85 90 95 

AAT GAC CCG CAC GGC TCT GAG CAC ACC GTC AGC GGA CAG CAC TTC GCC 336 
Asn Asp Pro His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala 

100 105 no 

GCC GAG CTG CAC ATT GTC CAT TAT AAC TCA GAC CTT TAT CCT GAC GCC 384 
Ala Glu Leu His lie Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 
H5 120 125 

AGC ACT GCC AGC AAC AAG TCA GAA GGC CTC GCT GTC CTG GCT GTT CTC 432 
Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 
130 135 140 



ATT GAG ATG GGC TCC TTC AAT CCG TCC TAT GAC AAG ATC TTC AGT CAC 
He Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys lie Phe Ser His 
145 150 155 160 



480 



CTT CAA CAT GTA AAG TAC AAA GGC CAG GAA GCA TTC GTC CCG GGA TTC 528 
Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 

165 170 175 

AAC ATT GAA GAG CTG CTT CCG GAG AGG ACC GCT GAA TAT TAC CGC TAC 576 
Asn He Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 

ISO 185 190 

CGG GGG TCC CTG ACC ACA CCC CCT TGC AAC CCC ACT GTG CTC TGG ACA 62 4 

Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 

GTT TTC CGA AAC CCC GTG CAA ATT TCC CAG GAG CAG CTG CTG GCT TTG 672 
Val Phe Arg Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu 
210 215 220 

GAG ACA GCC CTG TAC TGC ACA CAC ATG GAC GAC CCT TCC CCC AGA GAA 720 
Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 230 235 240 

ATG ATC AAC AAC TTC CGG CAG GTC CAG AAG TTC GAT GAG AGG CTG GTA 768 
Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 

245 250 255 

TAC ACC TCC TTC TCC CAA GTG CAA GTC TGT ACT GCG GCA GGA CTG AGT 816 
Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser 

260 265 270 

CTG GGC ATC ATC CTC TCA CTG GCC CTG GCT GGC ATT CTT GGC ATC TGT 864 
Leu Gly He He Leu Ser Leu Ala Leu Ala Gly He Leu Gly He Cys 
275 280 285 
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ill vl? 51? v T ? v T f I CC TGG CTT " C AGA AGG ** G AGT ATC AAA 

val Val Val Val Ser He Trp Leu Phe Arg Arg Lys Ser He Lys 

290 295 300 

AAA GGT GAT AAC AAG GGA GTC ATT TAG AAG CCA GCC ACC AAG ATG GAG 
Lys Gly Asp Asn Lys Gly Val lie Tyr Lys Pro Ala Thr Lys Met Glu 
305 310 315 320 

ACT GAG GCC CAC GCT TGAGGTCCCC G 

Thr Glu Ala His Ala 

325 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 325 amino acids 
(BJ TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: protein 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys 

Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin Ser Pro He Asp Leu His 

20 25 30 

Ser Asp lie Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 
J -> 40 45 

Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 
u 55 60 

His ser Val Lys Leu Asn Leu Pro Ser Asp Met His lie Gin Gly Leu 
65 70 75 so 

Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 

85 go 95 

Asn Asp Pro His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala 

100 105 110 

Ala Glu Leu His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 

120 12 5 

Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 
130 135 140 

lie Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys He P „e ser His 
145 150 155 160 

Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 

165 170 175 
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Asn lie Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 

180 185 190 

Arg Gly Ser Leu Thr Thr 'Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 

Val Phe Arg Asn Pro Val Gin lie ser Gin Glu Gin Leu Leu Ala Leu 
210 215 220 

Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 230 235 240 

Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 

245 250 255 

Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser 

260 265 270 

Leu Gly He He Leu Ser Leu Ala Leu Ala Gly He Leu Gly He Cys 
275 280 285 

He Val Val Val Val Ser He Trp Leu Phe Arg Arg Lys Ser He Lys 
290 295 300 

Lys Gly Asp Asn Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu 

310 315 320 

Thr Glu Ala His Ala 

325 

<2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2134 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

( ix ) FEATURE : 

(A) NAME/ KEY : CDS 

(B) LOCATION: 116. . 1177 

(ix) FEATURE: 

(A) NAME/KEY: mat^pepfcide 

(B) LOCATION: 203.. 1177 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GTACTCGCCA CGGCACCCAG GCTGCGCGCA CGCGGTCCCG GTGTGCAGCT GGAGAGCGAG 60 

CGGCCACCGG GAGCCCCCGG CACAGCCCGC GCCCGCCCCG CAGGAGCCCG CGAAG ATG 118 

Met 
-29 

CCC CGG CGC AGC CTG CAC GCG GCG GCC GTG CTC CTG CTG GTG ATC TTA 166 



BMSOOCtO: <WO 9602S52A1_I_> 



WO 96/02552 PCT/US95/09145 



46 



Pro Arg Arg Ser Leu His Ala Ala Ala Val Leu Leu Leu Val lie Leu 

-25 -20 



-15 



AAG GAA CAG CCT TCC AGC CCG GCC CCA GTG AAC GGT TCC AAG TGG ACT 214 
Lys Glu Gin Pro Ser Ser Pro Ala Pro Val Asn Gly Ser Lys Trp Thr 



1 



TAT TTT GGT CCT GAT GGG GAG AAT AGC TGG TCC AAG AAG TAC CCG TCG 262 
Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys Lys Tyr Pro Ser 
5 10 15 20 

TGT GGG GGC CTG CTG CAG TCC CCC ATA GAC CTG CAC AGT GAC ATC CTC 
Cys Gly Gly Leu Leu Gin Ser Pro He Asp Leu His Ser Asp He Leu 

25 30 



35 



CAG TAT GAC GCC AGC CTC ACG CCC CTC GAG TTC CAR GGC TAC AAT CTG 
Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin Gly Tyr Asn Leu 

40 45 50 



I I; ' 1 1 l - 1> - AAC AAT GGC CAT TCA GTG AAG 

Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly His Ser Val Lvs 

55 6 o -- J 

— _ TG CCC TCG GAC ATG CAC Arc CAG GGC CTC CAG TCT CGC TAC 

Leu Asn Leu Pro Ser Asp Met His He Gin Gly Leu Gin Ser Arg Tyr 

70 75 - ~ 



TCT GCC AAC AAG CAG TTT CTC CTG ACC AAC AAT GGC CAT TCA GTG AAG 

lis 
65 

^ fl° If? ?* C AT ? - < ? AC A T C ^ 000 CTC CAG TCT CGC TAC 

•eu 

80 

AGT <ff C ACG CAG CTG CAC CTG CAC TGG GGG AAC CCG AAT GAC CCG CAC 
Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro Asn Asp Pro His 
85 90 95 100 

GGC TCT GAG CAC ACC GTC AGC GGA CAG CAC TTC GCC GCC GAG CTG CAC 
Gly ser Glu His Thr Val Ser Gly Gin His Phe Ala Ala Glu Leu His 

105 110 1J5 

ATT GTC CAT TAT AAC TCA GAC CTT TAT CCT GAC GCC AGC ACT GCC AGC 
He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala Ser Thr Ala Ser 

120 - 125 1.30 

AAC AAG TCA GAA GGC CTC GCT GTC CTG GCT GTT CTC ATT GAG ATG GGC 
Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu He Glu Met GlJ 
I 35 140 145 * 

TCC TTC AAT CCG TCC TAT GAC AAG ATC TTC AGT CAC CTT CAA CAT GTA 
Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His Su SJ ml 

155 

AAG TAC AAA GGC CAG GAA GCA TTC GTC CCG GGA TTC AAC ATT GAA GAG 
Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe aJS "e SJ SJu 
165 170 175 180 

CTG CTT CCG GAG AGG ACC GCT GAA TAT TAC CGC TAC CGG GGG TCC CTG 
Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr Arg Gly Ser Leu 

185 190 X95 

ACC ACA CCC CCT TGC AAC CCC ACT GTG CTC TGG ACA GTT TTC CGA AAC 
Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr Val Phe Arg Asn 

200 205 210 



310 



358 



406 



454 



502 



550 



598 



646 



694 



742 



790 



838 
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CCC GTG CAA ATT TCC CAG GAG CAG CTG CTG GCT TTG GAG ACA GCC CTG 
Pro Val Gin He Ser Gin Glu Gin L u Leu Ala L u Glu Thr Ala Leu 
215 220 225 

TAC TGC ACA CAC ATG GAC GAC CCT TCC CCC AGA GAA ATG ATC AAC AAC 
Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu Met He Asn Asn 
230 235 240 

TTC CGG CAG GTC CAG AAG TTC GAT GAG AGG CTG GTA TAC ACC TCC TTC 
Phe Arg Gin Val Gin Lys Phe Asp Glu Axg Leu Val Tyr Thr Ser Phe 
245 250 255 260 

TCC CAA GTG CAA GTC TGT ACT GCG GCA GGA CTG AGT CTG GGC ATC ATC 
Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser Leu Gly lie He 

265 270 275 

CTC TCA CTG GCC CTG GCT GGC ATT CTT GGC ATC TGT ATT GTG GTG GTG 
Leu Ser Leu Ala Leu Ala Gly He Leu Gly He Cys He Val Val Val 

280 285 290 

GTG TCC ATT TGG CTT TTC AGA AGG AAG AGT ATC AAA AAA GGT GAT AAC 
Val Ser He Trp Leu Phe Arg Arg Lys Ser He Lys Lys Gly Asp Asn 
295 300 305 

AAG GGA GTC ATT TAC AAG CCA GCC ACC AAG ATG GAG ACT GAG GCC CAC 
Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu Thr Glu Ala His 
310 315 320 

GCT TGAGGTCCCC GGAGCTCCCG GGCACATCCA GGAAGGACCT TGCTTTGGAC 
Ala 

325 



886 



934 



982 



1030 



1078 



1126 



1174 



1227 



CCTACACACT TCGGCTCTCT GGACACTTGC GACACCTCAA GGTGTTCTCT GTAGCTCAAT 
CTGCAAACAT GCCAGGCCTC AGGGATCCTC TGCTGGGTGC CTCCTTGCCT TGGGACCATG 
GCCACCCCAG AGCCATCCGA TCGATGGATG GGATGCACTC TCAGACCAAG CAGCAGGAAT 
TCAAAGCTGC TTGCT GTAAC TGTGTGAGAT TGTGAAGTGG TCTGAATTCT GGAATCACAA 
ACCAAGCCAT GCTGGTGGGC CATTAATGGT TGGAAAACAC TTT CATC CGG GGCTTTGCCA 



GAGCGTGCTT TCAAGTGTCC TGGAAATTCT GCTGCTTCTC CAAGCTTTCA GACAAGAATG 



TCTCCCTCTG ATTTCCTTCT GCTATGACAA AACCTTTAAT CTGCACCTTA CAACTCGGGG 
ACAAATGGGG ACAGGAAGGA TCAAGTTGTA GAGAGAAAAA GAAAACAAGA GATATACATT 
GT GAT AT ATT AGGGACACTT TCACAGTCCT GTCCTCTGGA TCACAGACAC TGCACAGACC 
TTAGGGAATG GCAGGTTCAA GTTCCACTTC TTGGTGGGGA TGAGAAGGGA GAGAGAGCTA 
GAGGGACAAA GAGAATGAGA AGACATGGAT GATCTGGGAG AGTCTCACTT TGGAATCAGA 
ATTGGAATCA CATTCTGTTT ATCAAGCCAT AATGTAAGGA CAGAATAATA CAATATTAAG 



1207 
1347 
1407 
1467 
1527 
1587 




1647 
1707 
1767 
1827 
1887 
1947 
2007 
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TCCAAATCCA ACCTCCTGTC AGTGGAGCAG TTATGTTTTA TACTCTACAG ATTTTACAAA 2 067 

TAATGAGGCT GTTCCTTGAA AATGTGTTGT TGCTGTGTCC TGGAGGAGAC ATGAGTTCCG 2127 



AGATGAC 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 354 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Pro Arg Arg Ser Leu His Ala Ala Ala Val Leu Leu Leu Val He 

~ 29 -25 -20 -is 

Leu Lys Glu Gin Pro Ser Ser Pro Ala Pro Val Asn Gly Ser Lys Trp 

10 -5 ^ 

Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys Lys Tyr Pro 
5 10 15 

Ser Cys Gly Gly Leu Leu Gin Ser Pro He Asp Leu His Ser Asp He 
20 25 30 35 

Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin Gly Tyr Asn 

40 45 50 

Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly His Ser Val 

55 60 6 5 

Lys Leu Asn Leu Pro Ser Asp Met His He Gin Gly Leu Gin Ser Arg 
70 75 80 

Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro Asn Asp Pro 
85 90 95 

His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala Ala Glu Leu 
100 105 110 115 

His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala Ser Thr Ala 

"0 125 130 

Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu He Glu Met 

135 140 145 

Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His Leu Gin His 
150 155 i 6 o 

Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe Asn He Glu 
165 170 175 



2134 
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Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr Arg Gly Ser 

185 190 195 

Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr Val Phe Arg 

200 205 210 

Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu Glu Thr Ala 



215 



220 2 25 



Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu Met He Asn 
230 235 240 

Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val Tyr Thr Ser 
245 250 255 

Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser Leu Gly He 

265 270 275 

He Leu Ser Leu Ala Leu Ala Gly He Leu Gly He Cys He Val Val 

280 285 290 

Val Val ser lie Trp Leu Phe Arg Arg Lys Ser He Lys Lys Gly Asp 

295 300 305 

Asn Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu Thr Glu Ala 
310 315 320 

His Ala 

325 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 62 4 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



CCAATCTGCC 


TTTGAATCTG 


GAGGAAATAG 


GCAGAAACAA AATGACTGTA 


GAACTTATTC 


60 


TCTGTAGGCC 


AAATTTCATT 


TCAGCCACTT 


CTGCAGGATC 


CCTACTGCCA 


ACCTGGAATG 


120 


GAGACTTTTA 


TCTACTTCTC 


TCTCTCTGAA 


GATGTCAAAT 


CGTGGTTTAG 


ATCAAATATA 


180 


TTTCAAGCTA 


TAAAAGCAGG 


AGGTTATCTG 


TGCAGGGGGC 


TGGCATCATG 


TATTTAGGGG 


240 


CAAGTAATAA 


TGGAATGCTA 


CTAAGATACT 


CCATATTCTT 


CCCCGAATCA 


CACAGACAGT 


300 


TTCTGACAGG 


CGCAACTCCT 


CCATTTTCCT 


CCCGCAGGTG 


AGAACCCTGT 


GGAGATGAGT 


360 


CAGTGCCATG 


ACTGAGAAGG 


AACCGACCCC 


TAGTT GAGAG 


CACCTTGCAG 


TTCCCCGAGA 


420 


ACTTTCTGAT 


TCACAGTCTC 


ATTTTGACAG 


CAT GAAAT GT 


CCTCTTGAAG 


CATAGCTTTT 


480 
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TAAATATCTT TTTCCTTCTA CTCCTCCCTC TGACTCTAAG AATTCTCTCT TCTGGAATCG 54 0 

CTTGAACCCA GGAGGCGGAG GTTGCAGTAA GCCAAGGTCA TGCCACTGCA CTCTAGCCTG 600 
GGTGACAGAG CGAGACTCCA TCTC 



624 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 
tiv) ANTI-SENSE: NO 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 12 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

AGA AGG AAG AGT 
Arg Arg Lys Ser 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9 

Arg Arg Lys Ser 
1 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 



12 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



TGAGTCGACG 



10 



(2) INFORMATION TOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) AfcTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
AATTCGTCGA CTCA 14 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 613 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. . 813 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

TCC AAG TGG ACT TAT TTT GGT CCT GAT GGG GAG AAT AGC TGG TCC AAG 4 8 

Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys 
1 5 10 15 




■ 




AAG TAC CCG TCG TGT GGG GGC CTG CTG CAG TCC CCC ATA GAC CTG CAC 
Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin Ser Pro lie Asp Leu His 

20 25 30 



96 



AGT GAC ATC CTC CAG TAT GAC GCC AGC CTC ACG CCC CTC GAG TTC CAA 
Ser Asp lie Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 
35 40 45 



144 



GGC TAC AAT CTG TCT GCC AAC AAG CAG TTT CTC CTG ACC AAC AAT GGC 

Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 
50 55 60 



192 
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CAT TCA GTG AAG CTG AAC CTG rrr T rr rur ~- - 

His Ser val Lys Leu Asn Leu Pro If? ^ £ TG ^ ATC ^ 600 CTC 

65 70 P HiS Ile Gln L u 

75 80 

CAG TCT CGC TAC AGT GCC ACG CAG CTG CAC CTG CAC TGG anr w 

Gin Ser Arg Tyr Ser Ala Thr ri ^ j! . 000 AAC CCG 

g iyr *er Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 

" on 



2J S K 25 S K SS S ."5 S SJ 22 S K 

105 110 

GCC GAG CTG CAC ATT GTC CAT TAT AAC TCA GAC CTT TAT CCT ear err 
Ala Glu Leu His n» v«i « * iAT CCT GAC GCC 

U5 1 HlS ?*£ Asn Ser Leu Tyr Pro Asp Ala 



135 140 



ATT GAG ATG GGC TCC TTC AAT rrr, Trr „~ -~ 

He Glu Met Gly Ser Phe J£ Pro Iff J ^ ** G ATC TTC AGT 010 

145 Y ^ P LyS - Ile Phe Ser His 

155 160 

S S? S 51? £ £ SSISSS S vTTS S? S 

165 170 17 | 

s s sj s: s s s s s s k s? s; 

185 190 

s; ™ s 2? s s s s 5: je s s ^ s s s 

GTT TTC CGA AAC CCC GTG CAA ATT Trr r»r 

Val Phe Arg Asn Pro v.l rV^ tTI I S? 6 CAG CTG CTG GCT TTG 

210 1 " e Ser Gln Glu «ln Leu Leu Ala Leu 

215 220 



K K S S £ i S 2S S S i S2 E S i£ K 
K S 2 2 S 2S 5 S K 5 = S 

5 250 255 

TAC ACC TCC TTC TCC CAA GTG CAA GTC TGT ACT «rr e™ ^„ 

Tyr Thr Ser Phe Ser Gln Val G m Val lf & J£ S 

265 270 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS * 

(A) LENGTH: 271 amino acids 

(B) TYPE: amino acid 



240 



288 



336 



384 



432 



480 



528 



576 



62 4 



672 



720 



768 



813 
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(D) TOPOLOGY; linear 
Ui> MOLECULE TYPE: protein 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys 

Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin Ser Pro lie Asp Leu His 

20 25' — 



30 



Ser Asp He Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 
35 40 45 

Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 
50 .55 60 

His Ser Val Lys Leu Asn Leu Pro Ser Asp Met His He Gin Gly Leu 
65 7 0 75 80 

Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 

85 90 95 

Asn Asp Pro His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala 

100 105 no 

Ala Glu Leu His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 
115 120 125 

Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 
130 135 



140 



lie Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His 

150 155 160 

Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 

165 170 175 

Asn He Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 

I 80 185 190 

Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
i95 , 200 205 

Val ^ Arg Aan Pro Val Gln Ile Scr Gln Gin Leu Leu Ala Leu 

210 215 . 220 

Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 2 30 235 240 

Met He Asn Asn Phe Arg Gln Val Gln Lys Phe Asp Glu Arg Leu Val 

245 250 255 

Tyr Thr Ser Phe Ser Gln Val Gln Val Cys Thr Ala Ala Gly Leu 

260 265 270 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 822 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..822 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



48 



96 



TCC AAG TGG ACT TAT TTT GGT CCT GAT GGG GAG AAT AGC TGG TCC AAG 
Ser .Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys 
15 10 is 

AAG TAC CCG TCG TGT GGG GGC CTG CTG CAG TCC CCC ATA GAC CTG CAC 
Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin Ser Pro He Asp Leu His 

20 25 30 

AGT GAC ATC CTC CAG TAT GAC GCC AGC CTC ACG CCC CTC GAG TTC CAA 144 
Ser Asp He Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 
35 40 45 

GGC TAC AAT CTG TCT GCC AAC AAG CAG TTT CTC CTG ACC AAC AAT GGC 192 
Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 
50 55 60 



240 



CAT TCA GTG AAG CTG AAC CTG CCC TCG GAC ATG CAC ATC CAG GGC CTC 
His Ser Val Lys Leu Asn Leu Pro Ser Asp Met His He Gin Gly Leu 
65 7 0 75 80 

CAG TCT CGC TAC AGT GCC ACG CAG CTG CAC CTG CAC TGG GGG AAC CCG 28 8 

Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 

85 90 95 

AAT GAC CCG CAC GGC TCT GAG CAC ACC GTC AGC GGA CAG CAC TTC GCC 336 
Asn Asp Pro His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala 

100 105 no 

GCC -GAG CTG CAC ATT GTC CAT TAT AAC TCA GAC CTT TAT CCT GAC GCC " 384 

Ala Glu Leu His lie Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 
115 . 120 125 

AGC ACT GCC AGC AAC AAG TCA GAA GGC CTC GCT GTC CTG GCT GTT CTC 432 
Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 
130 135 140 

ATT GAG ATG GGC TCC TTC AAT CCG TCC TAT GAC AAG ATC TTC AGT CAC 480 
lie Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His 
145 150 155 160 
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CTT CAA CAT GTA AAG TAC AAA GGC CAG GAA GCA TTC GTC CCG GGA TTC 
Leu Gin His Val Lys Tyr Lys ciy Gin Glu Ala Phe Val Pro Gly Phe 

170 175 

AAC ATT GAA GAG CTG CTT CCG GAG AGG ACC GCT GAA TAT TAC CGC TAC 
Asn He Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 

180 185 

CGG GGG TCC CTG ACC ACA CCC CCT TGC AAC CCC ACT GTG CTC TGG ACA 
Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 

GTT TTC CGA AAC CCC GTG CAA ATT TCC CAG GAG CAG CTG CTG GCT TTG 
Val Phe Arg Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu 
210 215 220 

^ G <ff C CTG TAC TGC ACA CAC ATG GAC GAC CCT TCC CCC AGA GAA 

Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 230 235 * 240 

mI? tT C ^ ^ TTC CGG CAG GTC ^ AAG TTC GAT GAG AGG CTG GTA 
Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 

245 250 255 

£ AC I" TTC TCC GTG GTC TGT AC T GCG GCA GGA CTG ACT 

Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser 

260 265 270 

CTG GGC 
Leu Gly 



528 



576 



624 



672 



720 



768 



616 



822 



(2) INFORMATION FOR SEQ ID NO: 15: 

Ci) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 274 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 



^^^^Tr^Tlff^Tyr^rae^W 
1 5 



Gly Glu Asn Ser Trp Ser Lys 
10 is 



Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin Ser Pro He Asp Leu His 

20 25 30 

Ser Asp lie Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Phe Gin 
35 40 45 

Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 
30 55 go 

His ser Val Lys Leu Asn Leu Pro Ser Asp Met His lie Gin Gly Leu 
65 70 75 ^ eo 



»- 
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Gin Ser Arg Tyr Ser Ala 

85 

Asn Asp Pro His Gly Ser 

100 

Ala Glu Leu His lie Val 
115 

Ser Thr Ala Ser Asn Lys 
130 



lie Glu Met Gly Ser Phe 
145 iso 

Leu Gin His Val Lys Tyr 

165 

Asn lie Glu Glu Leu Leu 

180 

Arg Gly Ser Leu Thr Thr 
195 

Val Phe Arg Asn Pro Val 
210 

Glu Thr Ala Leu Tyr Cys 
225 230 

Met lie Asn Asn Phe Arg 

245 

Tyr Thr Ser Phe Ser Gin 

260 

Leu Gly 



58 

Thr Gin Leu His Leu His 

90 

Glu His Thr Val Ser Gly 
105 

His Tyr Asn Ser Asp Leu 
120 

Ser Glu Gly Leu Ala Val 
135 140 

Asn Pro Ser Tyr Asp Lys 

155 

Lys Gly Gin Glu Ala Phe 

170 

Pro Glu Arg Thr Ala Glu 
185 

Pro Pro Cys Asn Pro Thr 
200 



Gin He Ser Gin Glu Gin 
215 220 

Thr His Met Asp Asp Pro 

235 

Gin Val Gin Lys Phe Asp 

250 

Val Gin Val Cys Thr Ala 
265 



PCIYUS95/09145 



Trp Gly Asn Pro 
95 

Gin His Phe Ala 
110 

Tyr Pro Asp Ala 
125 



Leu Ala Val Leu 



He Phe Ser His 

160 

Val Pro Gly Phe 
175 

Tyr Tyr Arg Tyr 
190 



Val Leu Trp Thr 
205 



Leu Leu Ala Leu 



Ser Pro Arg Glu 

240 

Glu Arg Leu Val 

255 

Ala Gly Leu Ser 

270 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

Ui) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CTTTTTTGAT ACCCTTCCTT CTGAA 
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<2> INFORMATION FOR SEQ ID NO: 17; 

(i) SEQUENCE CHARACTERISTICS: 

(A> LENGTH: 986 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

<ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 975 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 

TCC AAG TGG ACT TAT TTT GGT CCT GAT GGG GAG AAT AGC TGG TCC AAC 
Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu JJJ S« 5 S £s 

5 10 15 

AAG TAC CCG TCG TGT GGG GGC CTG CTG CAG TCC CCC ATA GAC CTG CAC 
Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin Ser Pro Vie Sp "u 2J 

20 "" 25 3 0 

ACT GAC ATC CTC CAG TAT GAC GCC AGC CTC ACG CCC CTC GAG TTC CAA 
Ser Asp lie Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu <£S Phe S£ 
" 40 45 

£1 ™ £l f!. G IfT ™° ™ G « G T " CTC CTG ACC AAC AAT GGC 

*eu 

60 



civ tvt- wr: 7 o rr ^ G TTT CTc ctg *cc aac aat ggc 

Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 

55 go 

Sf I S vlf ^ G T CTG ^ C CTG CCC TCG GAC ATG ATC CAG GGC CTC 

His Ser val Lys Leu Asn Leu Pro Ser Asp Met His lie Gin Gly Leu 

75 00 
«n 111 »™ Z** e GT °? C ACG 010 CTG °* C CTG CAC TGG GGG AAC CCG 

Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 

85 90 95 

AAT GAC CCG CAC GGC TCT GAG CAC ACC CTC AGC GGA CAG CAC TTC GCC 
Asn Asp Pro His Gly Ser Glu His Thr Val Ser Gly £n £s "e Sa 

100 105 110 

Sf «u 11? ? C CTC ^ TAT ARC TCA »C CTT TAT CCT GAC GCC 

Ala Glu Leu His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 

120- 125 

AGC ACT GCC AGC AAC AAG TCA GAA GGC CTC CCT GTC CTG GCT CTT ctp 
Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu All vll Su SI £T Su 

u 135 140 

5e 22 SIf l CC ll C *** CCG TCC TAT *** ATC TTC AGT CAC 

lie Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His 

3 150 155 160 



48 



96 



144 



192 



240 



288 



336 



384 



432 



480 
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CTT CAA CAT GTA AAG TAC AAA GGC CAG GAA GCA TTC GTC CCG GGA TTC 

Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 

165 170 175 



CGG GGG TCC CTG ACC ACA CCC CCT TGC AAC CCC ACT GTG CTC TGG ACA 
Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 



CTG GGC ATC ATC CTC TCA CTG GCC CTG GCT GGC ATT CTT GGC ATC TGT 
Leu Gly lie lie Leu Ser Leu Ala Leu Ala Gly lie Leu Gly He Cys 
275 280 280 



AAA GGT GAT AAC AAG GGA GTC ATT TAC AAG CCA GCC ACC AAG ATG GAG 
Lys Gly Asp Asn Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu 
305 310 315 320 



528 



AAC ATT GAA GAG CTG CTT CCG GAG AGG ACC GCT GAA TAT TAC CGC TAC 576 
Asn lie Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 

180 185 190 



624 



GTT TTC CGA AAC CCC GTG CAA ATT TCC CAG GAG CAG CTG CTG GCT TTG 672 
Val Phe Arg Asn Pro Val Gin He Ser Gin Glu Gin Leu Leu Ala Leu 
210 215 220 

GAG ACA GCC CTG TAC TGC ACA CAC ATG GAC GAC CCT TCC CCC AGA GAA 720 
Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 2 30 235 240 

ATG ATC AAC AAC TTC CGG CAG GTC CAG AAG TTC GAT GAG AGG CTG GTA 7 68 

Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 

2 <5 250 255 

TAC ACC TCC TTC TCC CAA GTG CAA GTC TGT ACT GCG GCA GGA CTG AGT 816 
Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser 

2 ^0 265 270 



864 



ATT GTG GTG GTG GTG TCC ATT TGG CTT TTC AGA AGG AAG GGT ATC AAA 912 
He Val Val Val Val Ser He Trp Leu Phe Arg Arg Lys Gly He Lvs 
29 ° 295 300 



960 



ACT GAG GCC CAC GCT TGAGGTCCCC G oo fi 
Thr Glu Ala His Ala 



325 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACf ERISTICS : 

(A) LENGTH: 325 amino acids 
<B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Ser Lys Trp Thr Tyr Phe Gly Pro Asp Gly Glu Asn Ser Trp Ser Lys 
IS io is 
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Lys Tyr Pro Ser Cys Gly Gly Leu Leu Gin Ser Pro lie Asp Leu His 

20 25 30 

Ser Asp lie Leu Gin Tyr Asp Ala Ser Leu Thr Pro Leu Glu Fhe Gin 
35 40 45 

Gly Tyr Asn Leu Ser Ala Asn Lys Gin Phe Leu Leu Thr Asn Asn Gly 
SO 55 60 

His Ser Val Lys Leu Asn Leu Pro Ser Asp Met His lie Gin Gly Leu 
65 7 ° 75 80 

Gin Ser Arg Tyr Ser Ala Thr Gin Leu His Leu His Trp Gly Asn Pro 

85 90 r-; 95 

Asn Asp Pro His Gly Ser Glu His Thr Val Ser Gly Gin His Phe Ala 

100 105 no 

Ala Glu Leu His He Val His Tyr Asn Ser Asp Leu Tyr Pro Asp Ala 
US 120 125 

Ser Thr Ala Ser Asn Lys Ser Glu Gly Leu Ala Val Leu Ala Val Leu 
130 135 i 40 

He Glu Met Gly Ser Phe Asn Pro Ser Tyr Asp Lys He Phe Ser His 
145 150 155 160 

Leu Gin His Val Lys Tyr Lys Gly Gin Glu Ala Phe Val Pro Gly Phe 

165 170 175 

Asn He Glu Glu Leu Leu Pro Glu Arg Thr Ala Glu Tyr Tyr Arg Tyr 

180 185 190 

Arg Gly Ser Leu Thr Thr Pro Pro Cys Asn Pro Thr Val Leu Trp Thr 
195 200 205 

Val Phe Arg Asn Pro Val Gin lie Ser Glh Glu Gin Leu Leu Ala Leu 

210 215 - : * 220 

Glu Thr Ala Leu Tyr Cys Thr His Met Asp Asp Pro Ser Pro Arg Glu 
225 230 235 240 

Met He Asn Asn Phe Arg Gin Val Gin Lys Phe Asp Glu Arg Leu Val 

245 250 255 

Tyr Thr Ser Phe Ser Gin Val Gin Val Cys Thr Ala Ala Gly Leu Ser 

260 ZBb 270 

Leu Gly He He Leu Ser, Leu Ala- Leu Ala Gly He Leu Gly lie Cys 
275 280 285 

He Val Val Val Val Ser He Trp Leu Phe Arg Arg Lys Gly He Lys 
290 295 300 ; 

Lys Gly Asp Asn Lys Gly Val He Tyr Lys Pro Ala Thr Lys Met Glu 
305 310 315 320 
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Thr Glu Ala His Ala 

325 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
ACATTGAAGA GCTGCTTCCG G 

21 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
AATTTGCACG GGGTTTCGG 

19 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1363 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DMA (genomic) 

txi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 



CTGACACCAC 


TCAGACCGTG 


TGTGATCTGG 


CTCAACCAGT 


TCTGCGATCC 


CACCCAGGAA 


60 


CAGAAGACTG 


CAAGAAAACG 


TTACTTCAAC 


CCCCCTGTGA 


TCCCATCTGC 


AACCTGACCA 


120 


ATCAGCACTC 


CCCAAGTCCC 


AAGCCCCTAT 


CTGCCAAATT 


AT CTTTAAAA 


ACTCCCCAGA 


180 


GGCAGGGTGC 


AGTGGTTCAA 


CGCCTGTAAT 


CCCAGCACTT 


TAGGTGGATC 


ACGAGATCAA 


240 


GAGATCAAGA 


CCAGCCTGGC 


CAACAT GGTG 


AAACCCCGTC 


TTCTTACTAA 


AAATACAAAA 


300 


ATTAGCTGGG 


TGTGGCGGCG 


CGTGCCTGTA 


ATCCCAGCTA 


CCCAGGAGGC 


TGAGGCAGGA 


360 
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GAATCGCTTG AACCCGTGAG GCAGAGGTTG CAGTGAGCCA AGACCATGCC ACTGCATTTC 420 

AGCCTGGGCG ACAGAGGGGA ACTCCGTCTG AACAAACAAA CAAACAAACA ACTCCCGGAA 480 

TGCTTGGGGA GACTGATTTG AGTACTGGAA TCCCAGTACT TTAGGAGGCC AAGGTAGGTG 540 

GATCATTTGA GGTCAGGAGT TCCAGACCAG CCTGGCCAAC ATGGTGAAAC CCCGTCTCTA 600 

CTAAAATTAG AAAAATTAGC CGGGTGTGGT GGTGGGCGCC TGTAATCCCA GCACTTTGGG 660 

AAGCCAAGGC AGGTGAATTA TCTGAGGTCG GGAGTTTAAG GCCAGCCTTA AACTGGCGAA . 720 

ACCCCGCCTC TACTAAAAAT ACAAAAATTA TCTGGGCATG GTGGCATGTG CCTGTAATCC 780 

CAGCT ACT C G GGAGGCTGAG GCAGGAGAAT CGCTTGAACC CGGGAGGCGG AGGTTGCAGT 840 

GAGCCGAGAT CACGCTATTG CACTCCGGCC TGGGCAACAG AGCGAGACTC CGTCTCAAAC 900 

AAACAAACAA AGGAACGAAA ACTCCGGTCT CCGGCACGGC AAGCTCTGCG TGAATTACTT 960 

TCTC CATTGC AACTCCCCTG TCTTGATAAA TGGGCTCTGT CTAAGCAGCG GGCAAGGTGA 1020 

ACTCGTTGGG CTGTTACAGG ACCAGTGACA GACCAAGGCA TGCCACTGAA GGAATCCCTA 1080 

GACGCACCCT TCTGGATGTG AGGCAGGCGG ATCTCACCCC ACGCCTGCCA GCAGCTCCTC 1140 

GGAGAACTGT GTTCCTGGGT CAGCCCTGGC CCAGAGGAGC GCCGGGGACC CGCAGAGTGC 1200 

TGCTGAAGTC AAGGCTACAA CTCACCTAGG ATCTGGGGCG CCAGCCTCCG GTGGGCAGGG 1260 

CGTTCTCCTC CCCCACCCCC TCCCCGCACG ATGACATCAA GTGTTTGGCG TTGAGTTGCT 1320 
CCATAAAAGC TGCCCGGGGA AGCCAGGAGA GCGAAGGGCG GAC 



(2) INFORMATION FOR SEQ ID NO: 22: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
GTCCACTTGG ATCCGTTCAC TGG 



1363 



23 
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WE CIAIM: 

1. A substantially purified nucleic acid encoding 
the amino acid sequence of HCAVIII depicted in SEQ ID 
NO: 2 . 

2. The nucleic acid of Claim 1 wherein said nucleic 
acid is mRNA. 
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3. A cDNA encoding the amino acid sequence of 
HCAVIII or a portion thereof. 

4. The cDNA of Claim 3 wherein the amino acid 
sequence is encoded by the coding region of the nucleotide 
sequence depicted in SEQ ID NO:l. 

5. The cDNA of Claim 3 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 2. 

6. The cDNA of Claim 3 wherein the amino acid 
sequence is encoded by the coding region. of the nucleotide 
sequence depicted in SEQ ID NO: 3. 

7. The cDNA of Claim 3 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 4. 

8. The cDNA of Claim 3 wherein the amino acid 
sequence is encoded by the nucleotide sequence depicted in 
SEQ ID NO: 12. 

9. The cDNA of Claim 3 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 13. 

10. The cDNA of Claim 3 wherein the amino acid 
sequence is encoded by the nucleotide sequence depicted in 
SEQ ID NO: 14. 

11. The cDNA of Claim 3 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 15. 
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12. The cDNA of Claim 3 comprising the nucleotide 
sequence depicted in SEQ ID NO: 5. 

13. The cDNA of Claim 3 comprising the nucleotide 
sequences depicted in SEQ ID NO: 5 and SEQ ID NO: 7. 
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14. A cDNA encoding the amino acid sequence of 
HCAVIII wherein the phosphorylation region has been 
mutated. 

15. The cDNA of Claim 14 wherein the amino acid 
sequence is encoded by the nucleic acid sequence depicted 
in SEQ ID NO: 17. 

16. The cDNA of Claim 14 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 18. 
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17. A protein comprising the amino acid sequence of 
HCAVIII or a portion thereof. 

18. The protein of Claim 17 wherein the amino acid 
sequence is encoded by the coding region of the nucleic 
acid sequence depicted in SEQ ID NO:l. 

19. The protein of Claim 17 wherein the amino acid 
sequence comprises the .sequence depicted in SEQ ID NO: 2. 

20. The protein of Claim 17 wherein the amino acid 
sequence is encoded by the coding region of the nucleic 
acid sequence depicted in SEQ ID NO: 3. 

21. The protein of Claim 17 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 4. 

22. The protein of Claim 17 wherein the amino acid 
sequence is encoded by the coding region of the nucleic 
acid sequence depicted in SEQ ID NO: 12. 

23. The protein of Claim 17 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 13. 

24. The protein of Claim 17 wherein the amino acid 
sequence is encode^ by the coding region of the nucleic 
acid sequence depicted in SEQ ID NO: 14. 

25. The protein of Claim 17 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO:15. 
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26. A protein comprising the amino acid sequence of 
HCAVIII wherein the phosphorylation region has been 
mutated. 

27. The protein of claim 26 wherein the amino acid 
sequence is encoded by the nucleic acid sequence depicted 
in SEQ ID NO: 17. 

28. The protein of Claim 2 6 wherein the amino acid 
sequence comprises the sequence depicted in SEQ ID NO: 18. 
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29. A recombinant DNA clone comprising a cDNA of a 
HCAVIII transcript isolatable from human A549 ceils of 
about 1.1 Jcilobases. 
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30. An expression vector comprising the nucleic 
sequence for HCAVIII or a portion thereof. 

31. The expression vector of Claim 30 wherein the 
nucleic acid sequence comprises the coding region of the 
nucleotide sequence depicted in SEQ ID NO:l. 

32. The expression vector of Claim 30 wherein the 
nucleic acid sequence comprises the coding region of the 
nucleotide sequence depicted in SEQ ID NO: 3 

33. The expression vector of Claim. 30 wherein the 
nucleic acid sequence comprises the coding region of the 
nucleotide sequence depicted in SEQ ID NO: 12. 

34. The expression vector of Claim 30 wherein the 
nucleic acid sequence comprises the coding region of the 
nucleotide sequence depicted in SEQ ID NO: 14. 

35. The expression vector of Claim 30 wherein the 
nucleic acid sequence comprises the nucleotide sequence 
depicted in SEQ ID NO: 17. 
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5 36. A method of detecting cancerous and precancerous 

lung tissue comprising: 

(a) preparing a section of biopsy tissue; 

(b) probing said tissue with a labeled probe 
complementary to the cDNA of SEQ ID NO:l; 

10 (c) removing said probe which has not hybridized to 

the tissue; and 

(d) detecting the presence of the hybridized probe. 
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37. A method for detecting lung cancer antigen 
specific for non-small cell carcinoma in a human cell 
specimen comprising : 

a) labeling a DNA probe comprising the genomic clone 
of HCAVIII; 

b) reacting the labeled DNA probe with a human test 
cell specimen and a normal human cell specimen under 
conditions suitable for hybridization of the labeled probe 
to any HCAVIII mRNA which may be present in the test and 
normal cell specimen; 

c) removing unreacted components from the test and 
said normal cell specimens; 

d) detecting the hybridized probe bound to the test 
and normal cell specimens; 

e) quantifying and comparing the amount of hybridized 
probe bound to the test and normal cell specimens. 

38. The method of claim 37 further comprising: 

a) labeling a DNA probe comprising the genomic clone 
of HCAVIII with a substrate which can bind to a detecting 
substance to form a labeled DNA probe; 

b) reacting the labeled DNA probe with a human test 
cell specimen and a normal human cell specimen under 
c,ond r i^%i^f ^ 

to any HCAVIII mRNA which may be present in the test and 
normal cell specimens; - 

c) removing unreacted components from the test and 
normal cell specimens; 

d) reacting the test and normal cell specimens with 
a detecting substance which is capable of fluorescing; 

e) comparing the fluorescence of the test and normal 
cell specimens. 
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39. A method for screening human specimens for 
HCAVIII protein, comprising: 

a) mixing a human test specimen with a first amount 
of an antibody specific for the HCAVIII protein in a first 

5 reaction well; 

b) mixing a control lung cancer antigen comprising 
at least a portion of the HCAVIII protein with a second 
amount of said antibody specific for the HCAVIII protein 
in a second reaction well; and 

0 - c) detecting whether said test specimen binds to 

said antibody as compared to said control lung cancer 
antigen. 
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40. A method for testing a human cell sample for 
lung cancer comprising assaying a cell homogenate for 
carbonic anhydrase activity. 
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41. An antibody made by immunizing animals with a 
lung cancer antigen associated with non-small cell lung 
cancer cells. 

42. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO:2. 

43. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 4 . 

44. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 13. 

45. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO:15. 

46. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence" depicted in SEQ 
ID NO:18. 
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47. A therapeutic composition for the treatment of 
non-small cell lung cancer comprising an antibody to 
HCAVTII protein bound to a substance which affects the 
ability of said cancer to replicate. 

48* The method of claim 47 wherein said substance is 
a cancer drug. 

49. The method of claim 48 wherein said substance is 
a radioisotope. 

50. The method of claim 49 wherein .said substance 
affects gene expression of a gene encoding HCAVIII. 
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51. A substantially purified nucleic acid comprising 
the nucleotide sequence depicted in SEQ ID NO: 7. 
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52. A cDNA comprising the nucleotide sequence 
depicted in SEQ ID NO: 7. 
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53. A substantially purified nucleic acid comprising 
the nucleotide sequence depicted in SEQ ID NO: 21. 
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AMENDED CLAIMS 

[received by the International Bureau on 20 November 1995 (20.11.95); 
original claim 41 amended; remaining claims unchanged (1 page)] 



41. An antibody made by immunizing animals with , 
HCAVIII, a lung cancer antigen associated with non-small 
cell lung cancer cells. 

'42. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 2. 

43. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 

ID NO: 4. 

44. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 13. 

31. 

45. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 15. 

46. The antibody of Claim 41 wherein said lung 
cancer antigen has the amino acid sequence depicted in SEQ 
ID NO: 18. 
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US,A, 5,134,075 (HELLSTROM ET AL) 28 July 1992, especially 
column 4 lines 34-64. 

US,A, 4,816,402 (ROSEN ET AL) 28 March 1989, see entire 
document. 

Gastroenterology, Volume 105, Number 3, issued 1993, Mori et 
al, "The significance of carbonic anhydrase expression in human 
colorectal cancer", pages 820-826, see abstract. 

DNA and Cell Biology, Volume 11, Number 7, issued September 
1992, Skonier et al, "cDNA cloning and sequence analysis of Big- 
h3, a novel gene induced in a human adenocarcinoma cell line 
after treatment with transforming growth factor-beta", pages 511- 
522, see entire document. 
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Box I Observation, where certain claims were found unsearchable (Qmtinoat ion of item 1 of first sheet) 
This inlenubonal report has not been established in respect of certain claims under Article 17(2 Ha) for Che following reasons: 
I. Claims Nos.: 

because they relate to subject matter not required to be searched by this Authority, namely: 



Claims Nos.: 

arr^^r^ 10 PWU fi ° 1 flhC inUtmMiona, »PPii«Uon that do not comply with the prescribed requirements lo such 
an extent that no meaningful international search can be carried out, specifically: SUCh 



3. Q Claims Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third 

Box II Observations where unity of ideation b lacking (Continuation of item 2 of first sheet) 



of Rule 6 



Thia International Searching Authority found multiple invention, in thi. intenution.1 applicrtion. « follow,: 
Please See Extra Sheet. 



' Q £°^ T ° iX1>rai • dditionJ,1 se * rch fc « «~"= l™ely Paid by the applicant, this intemauonal 



search report covers all »carvlYahle 



3 ' D TSl^^loTZ^^'^ 1 "T* fCC T re UmC ' y ^ ^ lPP ^- *» search 
omy mose claims for which fees were paid, specifically cUims Nos.: 



4. 



□ "°^"^,! ddUi0n * 1 ***! ch fe « time 'y P«* by the applicnt. frequently, this international s~r 
reatneted to the mveouon fir* mentioned in the claims; it is covered by claim" No..: 



Remark on Pro** Q The .dd.Uonai search fees were accompanied by the appUcanf. protest 

□ 

No protest accompanied the payment of additional search fees. 
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A. CLASSIFICATION OF SUBJECT MATTER: 
IPC (6): 

C07H 19/00, 21/00. 21/02. 21/04; C07K 1/00, 14/00, 17/00. 16/00; C12Q 1/00. 1/68; G01N 33/53, 33/567; A01N 
43/04; A61K 31/70 

A. CLASSIFICATION OF SUBJECT MATTER: 

US CL : 

536/22.1, 23.1; 530/350, 387.1; 435/4, 6, 7.1. 7.2; 514/44. 424/85.8 

B. FIELDS SEARCHED 

Electronic data bases consulted (Name of data base and where practicable terms used): 
APS, BIOS1S, CAPLUS, CANCERLIT 

search terms: A549, HCAVtU, Human Cancer Antigen VIII, Cell surface antigen. Cell surface marker. Non-small cell 
lung cancer, Carbonic Anhydraac 

BOX D. OBSERVATIONS WHERE UNITY OF INVENTION WAS LACKING 
This ISA found multiple inventions as follows: 

This application contains the following inventions or groups of inventions which are not so linked as to form a single 
inventive concept under PCT Rule 13.1. In order for all inventions to be examined, the appropriate additional 
examination fees must be paid. 

Group I. claim(s)l-39 and 41-53, drawn to nucleic acids encoding the amino acid sequence of HCVIII. the protein 
expressed thereof, antibodies to the proteins and methods using one of the above. 

Group II, claim(s) 40, drawn to a method for testing a human cell sample for lung cancer by assaying a cell 
homogenate for carbonic anhydrasc activity. 



Pursuant to 37 CFR } 1.475(d) the additional method(s) beyond the one first method of use are considered to lack unity 
and are properly separated. 
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